MPI_Send MPI_ERR_COUNT: invalid count argument 140 million cells (bug)
Summary
I have encountered a problem with MPI_Send routine while running a 140 million cells case on the cluster: Virgo cluster at GSI Helmholtz Centre for Heavy Ion Research https://hpc.gsi.de/virgo/
I'm using chtMultiRegionSimpleFoam to solve a heat transfer problem for a multilayer PCB with vias in great detail. The solver goes through a few regions with no problem, but when it proceeds to a very big region with 141 211 296 cells, it crashes with the error below.
I have tried to increase number of subdomains, e.g. 1024, 2048, 4096 CPUs (all hierarchical method) and 1024 (with ptscotch), but the error persists. This makes me think, that the error is caused by the absolute number of cells in the region and independent of decomposition. A few more things were tried as a solution, but with no success: https://www.cfd-online.com/Forums/openfoam-solving/253681-mpi_send-mpi_err_count-invalid-count-argument.html
Steps to reproduce
Basically, any solver with with comparable mesh size case should fail. Surely, to run such a case, a lot of computational resources are required that's why it's difficult to reproduce.
Feel free to contact me if you need more details on the Virgo cluster that I'm using. Meanwhile, I will ask OpenFOAM community if anybody could test it on a cluster with comparable size.
Example case
My case can be found at https://sf.gsi.de/f/4db522c9b39b4125855f/?dl=1 (24,2 Mb)
Requirements: 1024 CPUs (it's with multithreading), 4 Gb RAM per processor, Slurm workload manager, OpenFOAM installed with WM_LABEL_SIZE=64
Simply run ./Allrun script
The case uses collated file format.
What is the current bug behaviour?
It seems that MPI_Send is called with a negative count argument. The count argument is a signed int https://www.open-mpi.org/doc/v4.1/man3/MPI_Send.3.php of 32 bit size, so it is likely overflowing (MPI_Send with a count > INT_MAX).
What is the expected correct behavior?
Basically, the solver should proceed with the very big region if there's no hardware limitations like in this situation.
Relevant logs and/or images
chtMultiRegionSimpleFoam fails with the following error message:
<...>
[lxbk1164:3797445] *** An error occurred in MPI_Send
[lxbk1164:3797445] *** reported by process [710282087,0]
[lxbk1164:3797445] *** on communicator MPI_COMM_WORLD
[lxbk1164:3797445] *** MPI_ERR_COUNT: invalid count argument
[lxbk1164:3797445] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[lxbk1164:3797445] *** and potentially your MPI job)
slurmstepd: error: *** STEP 19265579.0 ON lxbk1164 CANCELLED AT 2024-01-19T18:02:26 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
<...>
Full logs that were generated by the case can be downloaded at https://sf.gsi.de/f/66935ac60645422da948/?dl=1 log.* files are ordinary logs that OpenFOAM generates, Slurm-*.out are logs from the workload manager.
Environment information
- OpenFOAM version : v2306
- Operating system : CentOS-based
- Hardware info : https://hpc.gsi.de/virgo/user-guide/overview/hardware.html
- OpenMPI : 3.1.6, 4.1.2 (from ThirdParty-v2306), 5.0.0 (tried with three of them)
- Slurm: 21.08.8-2
- Compiler : gcc-toolset-13, gcc 10.2.0 (tried with the two)
Possible fixes
The PStream interface accepts a std::streamsize and implicitely casts it to the int argument on the mpi interfaces, performing a narrowing conversion https://develop.openfoam.com/Development/openfoam/-/blob/master/src/Pstream/mpi/UOPstreamWrite.C#L56
<source>:3:27: error: static assertion failed
3 | static_assert(sizeof(int) == sizeof(std::streamsize));
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:3:27: note: the comparison reduces to '(4 == 8)'
(from https://godbolt.org/)
OpenFOAM would have plenty of options to deal with this situation by e.g. issuing multiple MPI_Send or choose a larger MPI_Datatype.
P.S. It seems that there's a problem adding a bug label: /label bug