`residualControl` in `fvSolution` not working in multi MPI processes
When the AmgX solver is used with petsc4Foam (https://develop.openfoam.com/modules/external-solver), the residualControl in fvSolution is not working correctly
AmgX with petsc4Foam is built with OpenFOAM v2406 according to the build instruction in https://blog.nextfoam.co.kr/2024/01/10/gpu-accelerated-openfoam-with-petsc4foam/
residualControl in fvSolution and the amgxpOptions
- fvSolution
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
p 0.1;
}
}
- amgxpOptions
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
"preconditioner": {
"print_grid_stats": 0,
"algorithm": "AGGREGATION",
"print_vis_data": 0,
"solver": "AMG",
"smoother": {
"relaxation_factor": 0.8,
"scope": "jacobi",
"solver": "BLOCK_JACOBI",
"monitor_residual": 0,
"print_solve_stats": 0
},
"print_solve_stats": 0,
"presweeps": 0,
"interpolator": "D2",
"selector": "SIZE_2",
"coarse_solver": "NOSOLVER",
"max_iters": 1,
"monitor_residual": 1,
"store_res_history": 1,
"scope": "amg",
"max_levels": 50,
"postsweeps": 3,
"cycle": "V"
},
"solver": "PCG",
"print_solve_stats": 1,
"obtain_timings": 1,
"max_iters": 100,
"monitor_residual": 1,
"convergence": "RELATIVE_INI",
"scope": "main",
"tolerance" : 0.01,
"norm": "L2",
"store_res_history": 1
}
}
In this configuration, when the simpleFoam is started using a single process, the solver is finished successfully. The first iteration log and the last are following:
SIMPLE: convergence criteria
field p tolerance 0.1
Time = 1
smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.096874, No Iterations 14
smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0918301, No Iterations 12
smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0910265, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov 7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.64447 5.995583e-01
0 0.64447 3.332180e-01 0.5558
1 0.6445 2.221825e-01 0.6668
2 0.6445 1.575016e-01 0.7089
Total Time: 0.0887627
setup: 0.02867 s
solve: 0.0600927 s
solve(per iteration): 0.00500773 s
PETSc-AMGx: Solving for p, Initial residual = 0.599558, Final residual = 0.0065079, No Iterations 11
time step continuity errors : sum local = 0.000743556, global = 5.61266e-05, cumulative = 5.61266e-05
smoothSolver: Solving for omega, Initial residual = 0.0212029, Final residual = 0.000901774, No Iterations 3
bounding omega, min: -219.754 max: 20199.5 average: 285.026
smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0546809, No Iterations 3
ExecutionTime = 2.71 s ClockTime = 3 s
Time = 74
smoothSolver: Solving for Ux, Initial residual = 0.000413349, Final residual = 3.5012e-05, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.00905741, Final residual = 0.000879962, No Iterations 6
smoothSolver: Solving for Uz, Initial residual = 0.00841489, Final residual = 0.000827842, No Iterations 6
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.64447 9.953189e-02
0 0.64447 2.450995e-02 0.2463
1 0.6445 7.561778e-03 0.3085
2 0.6445 3.590985e-03 0.4749
3 0.6445 2.421009e-03 0.6742
4 0.6445 1.483251e-03 0.6127
5 0.6445 8.180997e-04 0.5516
----------------------------------------------------------------------
Total Iterations: 6
Avg Convergence Rate: 0.4492
Final Residual: 8.180997e-04
Total Reduction in Residual: 8.219474e-03
Maximum Memory Usage: 0.644 GB
----------------------------------------------------------------------
Total Time: 0.0473091
setup: 0.0226345 s
solve: 0.0246746 s
solve(per iteration): 0.00411244 s
PETSc-AMGx: Solving for p, Initial residual = 0.0995319, Final residual = 0.00148325, No Iterations 5
time step continuity errors : sum local = 8.2711e-05, global = 4.41344e-08, cumulative = -0.000409131
smoothSolver: Solving for omega, Initial residual = 5.56291e-05, Final residual = 3.78257e-06, No Iterations 3
smoothSolver: Solving for k, Initial residual = 0.0008333, Final residual = 6.97606e-05, No Iterations 3
ExecutionTime = 64.25 s ClockTime = 65 s
SIMPLE solution converged in 74 iterations
When the AmgX solver is used in a single processor, the solution converges correctly when the p is under 0.1 as defined in the fvSolution
However, if the multiple mpi processes are used as mpirun -np 2 simpleFoam -parallel with the same residualControl in fvSolution, the solver doesn't progress after start.
The simpleFoam log with 2 mpi processes are:
Time = 1
smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.0981494, No Iterations 14
smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0940637, No Iterations 12
smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0935349, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov 7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.759766 6.004220e-01
0 0.759766 3.342920e-01 0.5568
1 0.7598 2.214959e-01 0.6626
2 0.7598 1.561108e-01 0.7048
11 0.7598 4.615039e-03 0.7512
----------------------------------------------------------------------
Total Iterations: 12
Avg Convergence Rate: 0.6665
Final Residual: 4.615039e-03
Total Reduction in Residual: 7.686326e-03
Maximum Memory Usage: 0.760 GB
----------------------------------------------------------------------
Total Time: 0.0936948
setup: 0.0317286 s
solve: 0.0619661 s
solve(per iteration): 0.00516385 s
PETSc-AMGx: Solving for p, Initial residual = 0.600422, Final residual = 0.00614342, No Iterations 11
time step continuity errors : sum local = 0.000836866, global = -1.98778e-07, cumulative = -1.98778e-07
smoothSolver: Solving for omega, Initial residual = 0.0208742, Final residual = 0.000914672, No Iterations 3
bounding omega, min: -60.0497 max: 22493.7 average: 281.463
smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0739663, No Iterations 3
ExecutionTime = 1.67 s ClockTime = 2 s
After the first iteration Time = 1 is finished, the solver doesn't process. processes are still running as

But, the mpi processor1 is finised after one iteration, and the process0 do nothing.

The above figure shows processor directories. After one iteration, the 1 directory is created in the processor1. But, there nothing in processor0 directory. And the solver doesn't proceeds.
But, in the residualControl of fvSolution, if the U is checked instead of p, the solver with multiple MPI processes is successfully converged and gracefully stopped as expected.
- fvSolution
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
U 0.1;
}
}
The log of the above U residualControl option is
Time = 20
smoothSolver: Solving for Ux, Initial residual = 0.00477802, Final residual = 0.000389022, No Iterations 8
smoothSolver: Solving for Uy, Initial residual = 0.0999076, Final residual = 0.00902775, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.0875462, Final residual = 0.0080112, No Iterations 7
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.759766 4.504024e-01
0 0.759766 9.464080e-02 0.2101
1 0.7598 3.577178e-02 0.3780
2 0.7598 2.036319e-02 0.5693
3 0.7598 1.361054e-02 0.6684
4 0.7598 8.050442e-03 0.5915
5 0.7598 4.826851e-03 0.5996
6 0.7598 3.464516e-03 0.7178
----------------------------------------------------------------------
Total Iterations: 7
Avg Convergence Rate: 0.4989
Final Residual: 3.464516e-03
Total Reduction in Residual: 7.692046e-03
Maximum Memory Usage: 0.760 GB
----------------------------------------------------------------------
Total Time: 0.0500866
setup: 0.0228475 s
solve: 0.0272391 s
solve(per iteration): 0.00389131 s
PETSc-AMGx: Solving for p, Initial residual = 0.450402, Final residual = 0.00482685, No Iterations 6
time step continuity errors : sum local = 0.00062614, global = -5.14311e-05, cumulative = 9.34863e-05
smoothSolver: Solving for omega, Initial residual = 0.000877048, Final residual = 5.48736e-05, No Iterations 3
smoothSolver: Solving for k, Initial residual = 0.00585763, Final residual = 0.000513817, No Iterations 3
ExecutionTime = 11.83 s ClockTime = 12 s
SIMPLE solution converged in 20 iterations
Finalizing AmgX-p
The AMGX_finalize_plugins API call is deprecated and can be safely removed.
Finalizing PETSc
Finalising parallel run
The converged soltion 20 directory is created in each processorN directory
Comparison with petsc solver
To compare the solver behavior with the petsc solver. The solver and residualControl in fvSolution with petsc is following:
solvers
{
p
{
solver petsc;
petsc
{
options
{
ksp_type cg;
mat_type aijcusparse;
pc_type gamg;
}
}
tolerance 0;
relTol 0;
maxIter 250;
}
}
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
p 0.1;
}
}
When using the petsc solver is used compared to the amgx solver, the solver with multiple MPI processes is successfully working.
Time = 4
smoothSolver: Solving for Ux, Initial residual = 0.0409269, Final residual = 0.00328384, No Iterations 6
smoothSolver: Solving for Uy, Initial residual = 0.424882, Final residual = 0.0350059, No Iterations 6
smoothSolver: Solving for Uz, Initial residual = 0.283826, Final residual = 0.0260643, No Iterations 5
PETSc-cg: Solving for p, Initial residual = 0.0805296, Final residual = 5.1084e-155, No Iterations 250
time step continuity errors : sum local = 7.44603e-15, global = -4.31089e-17, cumulative = -5.70596e-17
smoothSolver: Solving for omega, Initial residual = 0.00567144, Final residual = 0.000545388, No Iterations 2
smoothSolver: Solving for k, Initial residual = 0.0196211, Final residual = 0.0013103, No Iterations 3
ExecutionTime = 21.19 s ClockTime = 21 s
SIMPLE solution converged in 4 iterations
Finalizing PETSc
Finalising parallel run
I wonder this issue is arised from the AmgX solver doesn't return the initial residual as 1. The first MPI process (process0) doesn't write result 1. Otherwise other processes write 1 directory. But, the residualControl check incorrectly in multi MPI processes.
