MPI Stream

Streaming computing models can naturally express fine-grained communication and allow for on-the-fly processing of large data sets with limited memory consumption. With the demand for processing the increasing amount of data within a reasonable time, streaming models are more and more used on supercomputers to solve data-intensive problems. On the other hand, irregular communication is common in conventional HPC applications and streaming models can support efficient pipelining of communication and computation in such irregular communication.

We provide a prototype library implementation, called MPIStream, atop MPI to support a streaming model on supercomputers. MPIStream is written in C and uses persistent communication between data producers and consumers. The library was benchmarked on both BG/Q and Cray XC40 machines. The library was linked to real-world plasma simulation code, linear solver benchmark and graph problem benchmark to support irregular communication, application-specific all-to-all communication and decoupling non-scalable parallel I/O operations.

A generic streaming application is present in panel a, consisting of a combination of two basic streaming applications: linear chain (panel b) and tree (panel c). In linear chain applications, each consumer is linked to only one producer by a data stream, while in the tree application there is more than one producer per consumer.

MPI Stream

Links to papers:

  • A Data Streaming Model in MPI,
    ExaMPI 2015 Workshop in SC’15. By I. B. Peng, S. Markidis, E. Laure, D. Holmes, and M. Bull
    DOI: 10.1145/2831129.2831131
  • A Performance Characterization of Streaming Computing on Supercomputers. S. Markidis, I.B.Peng, R.Lakymchuk, E. Laure, G.Kestor, and R.Gioiosa.
    In Proceedings of International Conference on Computational Science (ICCS)}, 2016.(To appear)