Skip to content

`residualControl` in `fvSolution` not working in multi MPI processes

When the AmgX solver is used with petsc4Foam (https://develop.openfoam.com/modules/external-solver), the residualControl in fvSolution is not working correctly

AmgX with petsc4Foam is built with OpenFOAM v2406 according to the build instruction in https://blog.nextfoam.co.kr/2024/01/10/gpu-accelerated-openfoam-with-petsc4foam/

residualControl in fvSolution and the amgxpOptions

  • fvSolution
SIMPLE
{
    nNonOrthogonalCorrectors 0;
    consistent yes;
    residualControl {
        p 0.1;
    }
}
  • amgxpOptions
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
    "preconditioner": {
        "print_grid_stats": 0,
        "algorithm": "AGGREGATION",
        "print_vis_data": 0,
        "solver": "AMG",
        "smoother": {
            "relaxation_factor": 0.8,
            "scope": "jacobi",
            "solver": "BLOCK_JACOBI",
            "monitor_residual": 0,
            "print_solve_stats": 0
        },
        "print_solve_stats": 0,
        "presweeps": 0,
        "interpolator": "D2",
        "selector": "SIZE_2",
        "coarse_solver": "NOSOLVER",
        "max_iters": 1,
        "monitor_residual": 1,
        "store_res_history": 1,
        "scope": "amg",
        "max_levels": 50,
        "postsweeps": 3,
        "cycle": "V"
    },
    "solver": "PCG",
    "print_solve_stats": 1,
    "obtain_timings": 1,
    "max_iters": 100,
    "monitor_residual": 1,
    "convergence": "RELATIVE_INI",
    "scope": "main",
    "tolerance" : 0.01,
    "norm": "L2",
    "store_res_history": 1
    }
}

In this configuration, when the simpleFoam is started using a single process, the solver is finished successfully. The first iteration log and the last are following:

SIMPLE: convergence criteria
field p      tolerance 0.1

Time = 1

smoothSolver:  Solving for Ux, Initial residual = 1, Final residual = 0.096874, No Iterations 14
smoothSolver:  Solving for Uy, Initial residual = 1, Final residual = 0.0918301, No Iterations 12
smoothSolver:  Solving for Uz, Initial residual = 1, Final residual = 0.0910265, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov  7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
        iter      Mem Usage (GB)       residual           rate
        ----------------------------------------------------------------------
            Ini             0.64447   5.995583e-01
            0             0.64447   3.332180e-01         0.5558
            1              0.6445   2.221825e-01         0.6668
            2              0.6445   1.575016e-01         0.7089

Total Time: 0.0887627
setup: 0.02867 s
solve: 0.0600927 s
solve(per iteration): 0.00500773 s

PETSc-AMGx:  Solving for p, Initial residual = 0.599558, Final residual = 0.0065079, No Iterations 11
time step continuity errors : sum local = 0.000743556, global = 5.61266e-05, cumulative = 5.61266e-05
smoothSolver:  Solving for omega, Initial residual = 0.0212029, Final residual = 0.000901774, No Iterations 3
bounding omega, min: -219.754 max: 20199.5 average: 285.026
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0546809, No Iterations 3
ExecutionTime = 2.71 s  ClockTime = 3 s

Time = 74

smoothSolver:  Solving for Ux, Initial residual = 0.000413349, Final residual = 3.5012e-05, No Iterations 7
smoothSolver:  Solving for Uy, Initial residual = 0.00905741, Final residual = 0.000879962, No Iterations 6
smoothSolver:  Solving for Uz, Initial residual = 0.00841489, Final residual = 0.000827842, No Iterations 6
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
        iter      Mem Usage (GB)       residual           rate
        ----------------------------------------------------------------------
            Ini             0.64447   9.953189e-02
            0             0.64447   2.450995e-02         0.2463
            1              0.6445   7.561778e-03         0.3085
            2              0.6445   3.590985e-03         0.4749
            3              0.6445   2.421009e-03         0.6742
            4              0.6445   1.483251e-03         0.6127
            5              0.6445   8.180997e-04         0.5516
        ----------------------------------------------------------------------
        Total Iterations: 6
        Avg Convergence Rate:                   0.4492
        Final Residual:                   8.180997e-04
        Total Reduction in Residual:      8.219474e-03
        Maximum Memory Usage:                    0.644 GB
        ----------------------------------------------------------------------
Total Time: 0.0473091
    setup: 0.0226345 s
    solve: 0.0246746 s
    solve(per iteration): 0.00411244 s
PETSc-AMGx:  Solving for p, Initial residual = 0.0995319, Final residual = 0.00148325, No Iterations 5
time step continuity errors : sum local = 8.2711e-05, global = 4.41344e-08, cumulative = -0.000409131
smoothSolver:  Solving for omega, Initial residual = 5.56291e-05, Final residual = 3.78257e-06, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.0008333, Final residual = 6.97606e-05, No Iterations 3
ExecutionTime = 64.25 s  ClockTime = 65 s

SIMPLE solution converged in 74 iterations

When the AmgX solver is used in a single processor, the solution converges correctly when the p is under 0.1 as defined in the fvSolution

However, if the multiple mpi processes are used as mpirun -np 2 simpleFoam -parallel with the same residualControl in fvSolution, the solver doesn't progress after start.

The simpleFoam log with 2 mpi processes are:

Time = 1

smoothSolver:  Solving for Ux, Initial residual = 1, Final residual = 0.0981494, No Iterations 14
smoothSolver:  Solving for Uy, Initial residual = 1, Final residual = 0.0940637, No Iterations 12
smoothSolver:  Solving for Uz, Initial residual = 1, Final residual = 0.0935349, No Iterations 13
Initializing PETSc
Number of GPU devices :: 1
AMGX version 2.5.0
Built on Nov  7 2024, 03:37:11
Compiled with CUDA Runtime 12.2, using CUDA driver 12.4
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
Initializing AmgX-p
Initializing AmgX Linear Solver p
Offloaded LDU matrix arrays on CUDA device and converted to CSR
Using Normal MPI (Hostbuffer) communicator...
        iter      Mem Usage (GB)       residual           rate
        ----------------------------------------------------------------------
            Ini            0.759766   6.004220e-01
            0            0.759766   3.342920e-01         0.5568
            1              0.7598   2.214959e-01         0.6626
            2              0.7598   1.561108e-01         0.7048

            11              0.7598   4.615039e-03         0.7512
        ----------------------------------------------------------------------
        Total Iterations: 12
        Avg Convergence Rate:                   0.6665
        Final Residual:                   4.615039e-03
        Total Reduction in Residual:      7.686326e-03
        Maximum Memory Usage:                    0.760 GB
        ----------------------------------------------------------------------
Total Time: 0.0936948
    setup: 0.0317286 s
    solve: 0.0619661 s
    solve(per iteration): 0.00516385 s
PETSc-AMGx:  Solving for p, Initial residual = 0.600422, Final residual = 0.00614342, No Iterations 11
time step continuity errors : sum local = 0.000836866, global = -1.98778e-07, cumulative = -1.98778e-07
smoothSolver:  Solving for omega, Initial residual = 0.0208742, Final residual = 0.000914672, No Iterations 3
bounding omega, min: -60.0497 max: 22493.7 average: 281.463
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0739663, No Iterations 3
ExecutionTime = 1.67 s  ClockTime = 2 s    

After the first iteration Time = 1 is finished, the solver doesn't process. processes are still running as image

But, the mpi processor1 is finised after one iteration, and the process0 do nothing. image

The above figure shows processor directories. After one iteration, the 1 directory is created in the processor1. But, there nothing in processor0 directory. And the solver doesn't proceeds.

But, in the residualControl of fvSolution, if the U is checked instead of p, the solver with multiple MPI processes is successfully converged and gracefully stopped as expected.

  • fvSolution
SIMPLE
{
    nNonOrthogonalCorrectors 0;
    consistent yes;
    residualControl {
        U 0.1;
    }
}

The log of the above U residualControl option is

Time = 20

smoothSolver:  Solving for Ux, Initial residual = 0.00477802, Final residual = 0.000389022, No Iterations 8
smoothSolver:  Solving for Uy, Initial residual = 0.0999076, Final residual = 0.00902775, No Iterations 7
smoothSolver:  Solving for Uz, Initial residual = 0.0875462, Final residual = 0.0080112, No Iterations 7
Offloaded LDU matrix values (only) on CUDA device and converted to CSR
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini            0.759766   4.504024e-01
              0            0.759766   9.464080e-02         0.2101
              1              0.7598   3.577178e-02         0.3780
              2              0.7598   2.036319e-02         0.5693
              3              0.7598   1.361054e-02         0.6684
              4              0.7598   8.050442e-03         0.5915
              5              0.7598   4.826851e-03         0.5996
              6              0.7598   3.464516e-03         0.7178
         ----------------------------------------------------------------------
         Total Iterations: 7
         Avg Convergence Rate:                   0.4989
         Final Residual:                   3.464516e-03
         Total Reduction in Residual:      7.692046e-03
         Maximum Memory Usage:                    0.760 GB
         ----------------------------------------------------------------------
Total Time: 0.0500866
    setup: 0.0228475 s
    solve: 0.0272391 s
    solve(per iteration): 0.00389131 s
PETSc-AMGx:  Solving for p, Initial residual = 0.450402, Final residual = 0.00482685, No Iterations 6
time step continuity errors : sum local = 0.00062614, global = -5.14311e-05, cumulative = 9.34863e-05
smoothSolver:  Solving for omega, Initial residual = 0.000877048, Final residual = 5.48736e-05, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.00585763, Final residual = 0.000513817, No Iterations 3
ExecutionTime = 11.83 s  ClockTime = 12 s

SIMPLE solution converged in 20 iterations

Finalizing AmgX-p
The AMGX_finalize_plugins API call is deprecated and can be safely removed.
Finalizing PETSc
Finalising parallel run

image

The converged soltion 20 directory is created in each processorN directory

Comparison with petsc solver

To compare the solver behavior with the petsc solver. The solver and residualControl in fvSolution with petsc is following:

solvers
{
    p
    {
        solver petsc;
        petsc
        {
            options
            {
                ksp_type cg;
                mat_type aijcusparse;
                pc_type gamg;
            }
        }
        tolerance       0;
        relTol          0;
        maxIter         250;
    }
}

SIMPLE
{
    nNonOrthogonalCorrectors 0;
    consistent yes;
    residualControl {
        p 0.1;
    }
}

When using the petsc solver is used compared to the amgx solver, the solver with multiple MPI processes is successfully working.

Time = 4

smoothSolver:  Solving for Ux, Initial residual = 0.0409269, Final residual = 0.00328384, No Iterations 6
smoothSolver:  Solving for Uy, Initial residual = 0.424882, Final residual = 0.0350059, No Iterations 6
smoothSolver:  Solving for Uz, Initial residual = 0.283826, Final residual = 0.0260643, No Iterations 5
PETSc-cg:  Solving for p, Initial residual = 0.0805296, Final residual = 5.1084e-155, No Iterations 250
time step continuity errors : sum local = 7.44603e-15, global = -4.31089e-17, cumulative = -5.70596e-17
smoothSolver:  Solving for omega, Initial residual = 0.00567144, Final residual = 0.000545388, No Iterations 2
smoothSolver:  Solving for k, Initial residual = 0.0196211, Final residual = 0.0013103, No Iterations 3
ExecutionTime = 21.19 s  ClockTime = 21 s


SIMPLE solution converged in 4 iterations

Finalizing PETSc
Finalising parallel run

I wonder this issue is arised from the AmgX solver doesn't return the initial residual as 1. The first MPI process (process0) doesn't write result 1. Otherwise other processes write 1 directory. But, the residualControl check incorrectly in multi MPI processes.

Edited by Bosung Lee
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information