parProfling for large core counts
Functionality to add/problem to solve
parProfiling
Setup:
- have a large number of cores
- have a large number of masters in
masterCoarsest
Questions:
- where is the bottleneck
- is it due to 'coarsest level' solution
- is it due to reductions or halo-swaps
Ideas on how to get this information out of parProfiling
FO
- have a per-processor output (or only min/max processor?)
- split off all-to-all from wait times
- add number of invocations. Why? Number of reductions is synchronised - only difference is between master and non-master processors. 2) halo swaps are determined by the number of processor boundaries which can be obtained from the boundary file. 3) Interesting is the number of linear-solver sweeps though.