add -mpi-split-by-appnum option (issue #3127)
- this can be used as an alternative to the (-world) multi-world option. For example, for calling OpenFOAM applications with MUI (https://github.com/MxUI/MUI)
Merge request reports
Activity
changed milestone to %v2506
added enhancement label
assigned to @Mattijs
mentioned in issue #3127 (closed)
- Resolved by Mark OLESEN
Hi @mark, many thanks again for implementing the
-mpi-split-by-appnum
feature. I have conducted a test using MUI coupling to evaluate how everything works together in practice. The implementation functions as expected, apart from two issues that I would like to highlight below:- Suggested Modification in UPstream.C
The first issue arises on Line 793 of UPstream.C:
MPI_Comm_dup(MPI_COMM_WORLD, &mpiNewComm);
This line is executed when
setParRun()
is called (Line 345). As it executes prior to the communicator splitting, this line results in the other coupled application hanging indefinitely.After attempting various workarounds, the only viable solution I have found is to comment out Line 793 and uncomment Line 790:
PstreamGlobals::MPICommunicators_[index] = MPI_COMM_WORLD;
May I know if it is possible to make this change?
- Suggested modification in MUI for compatibility with MUI’s MPI splitting strategy
The second issue is due to the differences in MPI communicator splitting strategies:
-
OpenFOAM uses
MPI_Allgather()
withUPstream::newCommunicator
. -
MUI uses
MPI_Comm_split()
directly.
Both
MPI_Allgather()
andMPI_Comm_split()
are collective MPI operations, which means mismatched calls across processes in an MPMD setup will result in deadlocks:-
When OpenFOAM reaches
MPI_Allgather()
, MUI ranks are not expecting it. -
When MUI executes
MPI_Comm_split()
, OpenFOAM ranks are not participating.
To resolve this, I’ve implemented an
MPI_Allgather()
andMPI_Comm_group()
based approach for split by appnum in MUI (see MUI-PR#108). This ensures that all ranks participate inMPI_Allgather()
when OpenFOAM is involved, thereby preventing mismatched collectives. No changes are required on the OpenFOAM side for this issue.After applying the two changes above, everything works as expected with MUI coupling. Please let me know your thoughts on the proposed modification to Lines 790 and 793 in UPstream.C.
mentioned in commit 09193526
mentioned in commit e5be5c21
mentioned in commit c4e55fa8
added 21 commits
-
1c6cdde0...c5ceec3c - 19 commits from branch
develop
- c4e55fa8 - ENH: allow disabling of initial MPI_Comm_dup(MPI_COMM_WORLD,...)
- dc455d8d - ENH: add -mpi-split-by-appnum option (issue #3127 (closed))
-
1c6cdde0...c5ceec3c - 19 commits from branch
mentioned in commit 34143b43
added 2 commits
- 34143b43 - ENH: allow disabling of initial MPI_Comm_dup(MPI_COMM_WORLD,...)
- 4334aa43 - ENH: add -mpi-split-by-appnum option (issue #3127 (closed))
mentioned in commit 5bb03048