Neighborhood Collective

Isomorphic, Sparse MPI-like Collective Communication Operations

MPI sparse collective communication operations

In 2012, MPI-3 introduced sparse or neighborhood collective communication. The neighborhood collective operations differ from the regular collectives by enabling each participating process to communicate with only a small neighborhood of processes (in contrast to the global collectives where each processes interact with all other processes). Sparse communication may be found in stencil computations.

For many problems, we need a more flexible interface than the very restricted Cartesian neighborhood (cf. MPI_Cart_create) to define the neighborhood. The only other choice is the completely general communication graph constructor (e.g. MPI_Dist_graph_create(_adjacent)) which is conceptually heavy makes optimization for specific problem cases difficult.

With these observations, Träff et al. have proposed a more lightweight interface [@traff2015], which explicitly asserts the process neighborhoods to be isomorphic.

What is the meaning of isomorphic?

Isomorphic means that all processes have the exact same neighborhoods, andconsequently, communicate in structurally similar patterns. Asserting the process neighborhood to be isomorphic enables the possibility of immediately using structured algorithms and performing non-trivial optimizations.

An isomorphic communication pattern is defined relative to a given, structured organization of processes. More precisely, let p the number of participating processes, and assume they are organized in a d-dimensional torus of p_0 \times p_1 \times \dots \times p_{d-1}, where, \prod_{i=0}^{d-1} p_i = p. Each process R is identified by a coordinate (r_0,r_1,\dots , r_{d-1}) with 0\le r_i < p_i for i=0,\dots ,d-1. A sparse neighborhood is a sequence of s relative coordinate vectors C^0,C^1,\dots ,C^{s-1} to which the process must send data. C^i has the form (c_0^i,c_1^i,\dots ,c_{d-1}^i) where the offsets c_j^i may be positive or negative. A set of identical sparse neighborhoods for a set of ranked processes is said to be isomorphic.

An isomorphic, sparse collective communication is a collective communication over p processes with isomorphic neighborhoods. Each process R with a sparse neighborhood send data to its s target processes: {((r_0+c_0^i)\mod p_0, (r_1+c_1^i)\mod p_1, \dots , (r_{d-1}+c_{d-1}^i)\mod p_{d-1})}. And since the neighborhood is isomorphic, each process R receive data from s source processes, which are identified by the true opposite coordinates of the s target processes.

Based on this definition, many of collective communication algorithms may be defined, for instance see [@traff2015].

An interface for sparse collective communication

A light-weight interface for isomorphic sparse collective communication in arbritrary Cartesian communicators is introduced here.

Translation functions

Coordiantes and relative coordinate vectors are represented as flat, d-dimensional integer arrays. In order to navigate in the sparse neighborhoods and assign ranks to relative coordinate offsets (and vice versa), the interface provides the three following translation functions:

// Compute absolute rank relative to caller
TUW_Cart_relative_rank(MPI_Comm cartcomm, int relative[], int *rank)

// Compute relative coordinate from caller
TUW_Cart_relative_coord(MPI_Comm cartcomm, int rank, int relative[])

// Generalized shift in relative direction
TUW_Cart_relative_shift(MPI_Comm cartcomm, int relative[], int *source, int *target)

Isomorphic neighborhood constructor

This constructor is a collective call, it requires that all MPI processes call with exactly the same local neighborhood. It must be called on a Cartesian topology and attaches an s-neighborhood of relative coordinates.

TUW_Iso_neighborhood_create(MPI_Comm cartcomm, int s, int relative_coordinates[], MPI_Comm *isocomm)

Query functions

The query functions are defined in analogy with the distributed graph interface of the MPI 3.1 standard [@MPI-3.1].

// Get the size of the s-neighborhood, the in- and out-degree of the caller (excluding MPI_PROC_NULL)
TUW_Iso_neighborhood_count(MPI_Comm isocomm, int *s, int *indegree, int *outdegree)

// Get the absolute ranks of the first max_s target and source neighbors of the caller (including MPI_PROC_NULL)
TUW_Iso_neighborhood_get(MPI_Comm isocomm, int max_s, int sources[], int destinations[])

// Get input to create corresponding MPI distributed graph (excludes MPI_PROC_NULL)
TUW_Iso_neighborhood_graph_get(MPI_Comm isocomm, int max_s, int sources[], int destinations[])

For convenience we also provide functions that return in the same way the predefined neighbors of a Cartesian communicator (see the code1 and [@traff2015])

Isomorphic, sparse collective operations

We can now use isomorphic, sparse collective operation on communicator with attached s-neighborhood. For instance, the irregular alltoall (the most general form of complete exchange) looks like this:

TUW_Iso_neighbor_alltoallw(void *sendbuf, int sendcount[], MPI_Aint senddisp[], MPI_Datatype sendtype[],
                           void *recvbuf, int recvcount[], MPI_Aint recvdisp[], MPI_Datatype recvtype[],
                           MPI_Comm isocomm)

The current interface provides:

  • The allgather family: TUW_Iso_neighbor_allgather, TUW_Iso_neighbor_allgatherv, and TUW_Iso_neighbor_allgatherw,
  • The all-to-all family: TUW_Iso_neighbor_alltoall, TUW_Iso_neighbor_alltoallv, and TUW_Iso_neighbor_alltoallw,
  • Global reduction operation: TUW_Iso_neighbor_reduce, TUW_Iso_neighbor_allreduce.

Source code

The latest release can be found here. The documentation can be easily generated using Doxygen.