Linear algebra profiling in standard output
Hi,
after a discussion with @mark and Ivan Spisso @ivanspisso (chairman of the HPC technical committee), we think it can be useful for a user to see in the output some more information about the execution time of each preconditioner/solver pair of the PISO/PIMPLE/SIMPLE loop and not only its total execution. This solution would show if there are rooms for improvements by changing the solver/preconditioner or tuning its parameters.
What I propose is something like this:
Time = 0.000625
Courant Number mean: 0.0135897 max: 0.241523
DILUPBiCGStab: Solving for Ux, Initial residual = 0.012631, Final residual = 0.00040317, No Iterations 5, Preconditioner time = 0.02 s, Solver time = 1.23 s
DILUPBiCGStab: Solving for Uy, Initial residual = 0.0295468, Final residual = 0.00096396, No Iterations 5, Preconditioner time = 0.02 s, Solver time = 1.25 s
DILUPBiCGStab: Solving for Uz, Initial residual = 0.109576, Final residual = 0.00700867, No Iterations 5, Preconditioner time = 0.02 s, Solver time = 1.24 s
petsc-cg: Solving for p, Initial residual = 0.538072, Final residual = 9.9628e-05, No Iterations 1164, Preconditioner time = 0.14 s, Solver time = 1.76 s
time step continuity errors : sum local = 8.47465e-10, global = -8.77385e-24, cumulative = -2.99506e-21
petsc-cg: Solving for p, Initial residual = 0.49318, Final residual = 9.97012e-05, No Iterations 1161, Preconditioner time = 0.13 s, Solver time = 2.24 s
time step continuity errors : sum local = 8.44322e-10, global = -2.7257e-22, cumulative = -3.26763e-21
ExecutionTime = 455.77 s ClockTime = 470 s
The idea is to add to each solver line, after the number of iterations, the time spent in the preconditioner and in the solver. Since this solution has a higher overhead and has an impact on the standard output log, it can be "activated" optionally by the user. I think that the profiling already implemented in OpenFOAM is more suited for a developer than a user, and it is not possible to see how much time is spent in each linear algebra solver. From the implementation point of view, Mark suggested to add two ClockTime values to the SolverPerformance class.