`residualControl` in `fvSolution` not working in multi MPI processes
When the AmgX solver is used with petsc4Foam
(https://develop.openfoam.com/modules/external-solver), the residualControl
in fvSolution
is not working correctly
AmgX with petsc4Foam is built with OpenFOAM v2406 according to the build instruction in https://blog.nextfoam.co.kr/2024/01/10/gpu-accelerated-openfoam-with-petsc4foam/
residualControl
in fvSolution and the amgxpOptions
- fvSolution
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
p 0.1;
}
}
- amgxpOptions
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
"preconditioner": {
"print_grid_stats": 0,
"algorithm": "AGGREGATION",
"print_vis_data": 0,
"solver": "AMG",
"smoother": {
"relaxation_factor": 0.8,
"scope": "jacobi",
"solver": "BLOCK_JACOBI",
"monitor_residual": 0,
"print_solve_stats": 0
},
"print_solve_stats": 0,
"presweeps": 0,
"interpolator": "D2",
"selector": "SIZE_2",
"coarse_solver": "NOSOLVER",
"max_iters": 1,
"monitor_residual": 1,
"store_res_history": 1,
"scope": "amg",
"max_levels": 50,
"postsweeps": 3,
"cycle": "V"
},
"solver": "PCG",
"print_solve_stats": 1,
"obtain_timings": 1,
"max_iters": 100,
"monitor_residual": 1,
"convergence": "RELATIVE_INI",
"scope": "main",
"tolerance" : 0.01,
"norm": "L2",
"store_res_history": 1
}
}
In this configuration, when the simpleFoam
is started using a single process, the solver is finished successfully. The first iteration log and the last are following:
SIMPLE: convergence criteria
field p tolerance 0.1
Time = 1
smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.096874, No Iterations 14
smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0918301, No Iterations 12
smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0910265, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov 7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.64447 5.995583e-01
0 0.64447 3.332180e-01 0.5558
1 0.6445 2.221825e-01 0.6668
2 0.6445 1.575016e-01 0.7089
Total Time: 0.0887627
setup: 0.02867 s
solve: 0.0600927 s
solve(per iteration): 0.00500773 s
PETSc-AMGx: Solving for p, Initial residual = 0.599558, Final residual = 0.0065079, No Iterations 11
time step continuity errors : sum local = 0.000743556, global = 5.61266e-05, cumulative = 5.61266e-05
smoothSolver: Solving for omega, Initial residual = 0.0212029, Final residual = 0.000901774, No Iterations 3
bounding omega, min: -219.754 max: 20199.5 average: 285.026
smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0546809, No Iterations 3
ExecutionTime = 2.71 s ClockTime = 3 s
Time = 74
smoothSolver: Solving for Ux, Initial residual = 0.000413349, Final residual = 3.5012e-05, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.00905741, Final residual = 0.000879962, No Iterations 6
smoothSolver: Solving for Uz, Initial residual = 0.00841489, Final residual = 0.000827842, No Iterations 6
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.64447 9.953189e-02
0 0.64447 2.450995e-02 0.2463
1 0.6445 7.561778e-03 0.3085
2 0.6445 3.590985e-03 0.4749
3 0.6445 2.421009e-03 0.6742
4 0.6445 1.483251e-03 0.6127
5 0.6445 8.180997e-04 0.5516
----------------------------------------------------------------------
Total Iterations: 6
Avg Convergence Rate: 0.4492
Final Residual: 8.180997e-04
Total Reduction in Residual: 8.219474e-03
Maximum Memory Usage: 0.644 GB
----------------------------------------------------------------------
Total Time: 0.0473091
setup: 0.0226345 s
solve: 0.0246746 s
solve(per iteration): 0.00411244 s
PETSc-AMGx: Solving for p, Initial residual = 0.0995319, Final residual = 0.00148325, No Iterations 5
time step continuity errors : sum local = 8.2711e-05, global = 4.41344e-08, cumulative = -0.000409131
smoothSolver: Solving for omega, Initial residual = 5.56291e-05, Final residual = 3.78257e-06, No Iterations 3
smoothSolver: Solving for k, Initial residual = 0.0008333, Final residual = 6.97606e-05, No Iterations 3
ExecutionTime = 64.25 s ClockTime = 65 s
SIMPLE solution converged in 74 iterations
When the AmgX solver is used in a single processor, the solution converges correctly when the p
is under 0.1
as defined in the fvSolution
However, if the multiple mpi processes are used as mpirun -np 2 simpleFoam -parallel
with the same residualControl
in fvSolution
, the solver doesn't progress after start.
The simpleFoam log with 2 mpi processes are:
Time = 1
smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.0981494, No Iterations 14
smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0940637, No Iterations 12
smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0935349, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov 7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.759766 6.004220e-01
0 0.759766 3.342920e-01 0.5568
1 0.7598 2.214959e-01 0.6626
2 0.7598 1.561108e-01 0.7048
11 0.7598 4.615039e-03 0.7512
----------------------------------------------------------------------
Total Iterations: 12
Avg Convergence Rate: 0.6665
Final Residual: 4.615039e-03
Total Reduction in Residual: 7.686326e-03
Maximum Memory Usage: 0.760 GB
----------------------------------------------------------------------
Total Time: 0.0936948
setup: 0.0317286 s
solve: 0.0619661 s
solve(per iteration): 0.00516385 s
PETSc-AMGx: Solving for p, Initial residual = 0.600422, Final residual = 0.00614342, No Iterations 11
time step continuity errors : sum local = 0.000836866, global = -1.98778e-07, cumulative = -1.98778e-07
smoothSolver: Solving for omega, Initial residual = 0.0208742, Final residual = 0.000914672, No Iterations 3
bounding omega, min: -60.0497 max: 22493.7 average: 281.463
smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0739663, No Iterations 3
ExecutionTime = 1.67 s ClockTime = 2 s
After the first iteration Time = 1
is finished, the solver doesn't process. processes are still running as
But, the mpi processor1 is finised after one iteration, and the process0 do nothing.
The above figure shows processor
directories. After one iteration, the 1
directory is created in the processor1
. But, there nothing in processor0
directory. And the solver doesn't proceeds.
But, in the residualControl
of fvSolution
, if the U
is checked instead of p
, the solver with multiple MPI processes is successfully converged and gracefully stopped as expected.
- fvSolution
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
U 0.1;
}
}
The log of the above U
residualControl option is
Time = 20
smoothSolver: Solving for Ux, Initial residual = 0.00477802, Final residual = 0.000389022, No Iterations 8
smoothSolver: Solving for Uy, Initial residual = 0.0999076, Final residual = 0.00902775, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.0875462, Final residual = 0.0080112, No Iterations 7
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 0.759766 4.504024e-01
0 0.759766 9.464080e-02 0.2101
1 0.7598 3.577178e-02 0.3780
2 0.7598 2.036319e-02 0.5693
3 0.7598 1.361054e-02 0.6684
4 0.7598 8.050442e-03 0.5915
5 0.7598 4.826851e-03 0.5996
6 0.7598 3.464516e-03 0.7178
----------------------------------------------------------------------
Total Iterations: 7
Avg Convergence Rate: 0.4989
Final Residual: 3.464516e-03
Total Reduction in Residual: 7.692046e-03
Maximum Memory Usage: 0.760 GB
----------------------------------------------------------------------
Total Time: 0.0500866
setup: 0.0228475 s
solve: 0.0272391 s
solve(per iteration): 0.00389131 s
PETSc-AMGx: Solving for p, Initial residual = 0.450402, Final residual = 0.00482685, No Iterations 6
time step continuity errors : sum local = 0.00062614, global = -5.14311e-05, cumulative = 9.34863e-05
smoothSolver: Solving for omega, Initial residual = 0.000877048, Final residual = 5.48736e-05, No Iterations 3
smoothSolver: Solving for k, Initial residual = 0.00585763, Final residual = 0.000513817, No Iterations 3
ExecutionTime = 11.83 s ClockTime = 12 s
SIMPLE solution converged in 20 iterations
Finalizing AmgX-p
The AMGX_finalize_plugins API call is deprecated and can be safely removed.
Finalizing PETSc
Finalising parallel run
The converged soltion 20
directory is created in each processorN
directory
petsc
solver
Comparison with To compare the solver behavior with the petsc
solver. The solver
and residualControl
in fvSolution
with petsc
is following:
solvers
{
p
{
solver petsc;
petsc
{
options
{
ksp_type cg;
mat_type aijcusparse;
pc_type gamg;
}
}
tolerance 0;
relTol 0;
maxIter 250;
}
}
SIMPLE
{
nNonOrthogonalCorrectors 0;
consistent yes;
residualControl {
p 0.1;
}
}
When using the petsc
solver is used compared to the amgx
solver, the solver with multiple MPI processes is successfully working.
Time = 4
smoothSolver: Solving for Ux, Initial residual = 0.0409269, Final residual = 0.00328384, No Iterations 6
smoothSolver: Solving for Uy, Initial residual = 0.424882, Final residual = 0.0350059, No Iterations 6
smoothSolver: Solving for Uz, Initial residual = 0.283826, Final residual = 0.0260643, No Iterations 5
PETSc-cg: Solving for p, Initial residual = 0.0805296, Final residual = 5.1084e-155, No Iterations 250
time step continuity errors : sum local = 7.44603e-15, global = -4.31089e-17, cumulative = -5.70596e-17
smoothSolver: Solving for omega, Initial residual = 0.00567144, Final residual = 0.000545388, No Iterations 2
smoothSolver: Solving for k, Initial residual = 0.0196211, Final residual = 0.0013103, No Iterations 3
ExecutionTime = 21.19 s ClockTime = 21 s
SIMPLE solution converged in 4 iterations
Finalizing PETSc
Finalising parallel run
I wonder this issue is arised from the AmgX solver doesn't return the initial residual as 1
. The first MPI process (process0) doesn't write result 1
. Otherwise other processes write 1
directory. But, the residualControl check incorrectly in multi MPI processes.