MPI Support for Streaming Computing

EPiGRAM also investigated the support for emerging computing models that will be likely used on massively parallel supercomputers in the future. An example of such computing models is the data streaming computing model that is an effective way to tackle challenges from data-intensive applications. However, streaming computing is not naturally supported in MPI.

In EPiGRAM, we have designed and implemented a library called MPIStream that allows HPC applications to globally allocate data producers and consumers on MPI processes, to stream data continuously or irregularly, to receive and process data and to terminate the streaming operations. Use cases of enabling HPC applications to carry out threshold collective operations, to monitor and control applications and to perform parallel I/O of irregular events are illustrated in.

Our MPI streaming library targets the streaming model for distributed systems, where MPI is the dominant programming system. The MPIStream library is written in C and built on the top of MPI. A stream is a continuous flow of stream elements, which is the basic unit of transmission between data producer and streamer. MPI data types are used to describe the memory layout of the elements on data producers to achieve zero-copy streaming and consequently saving memory consumption on large systems. MPI persistent communication is used to reduce the overhead of repeatedly calling receive routines.

The performance of the MPIStream library has been evaluated using a parallel STREAM benchmark on two supercomputers: Beskow Cray CX40 at KTH and Mira BlueGene/Q at the Argonne National Laboratory. The performance results show that the library can achieve acceptable performance (52 %–65 % of the maximum available bandwidth) and demonstrate its potential by reaching as high as 200 GB/s and 80 GB/s processing rate using 2,048 data producers over 2,048 data consumers on the Blue Gene/Q and Cray XC40 supercomputers respectively.