Skip to content

parProfling for large core counts

Functionality to add/problem to solve

parProfiling

Setup:

  • have a large number of cores
  • have a large number of masters in masterCoarsest

Questions:

  • where is the bottleneck
  • is it due to 'coarsest level' solution
  • is it due to reductions or halo-swaps

Ideas on how to get this information out of parProfiling FO

  • have a per-processor output (or only min/max processor?)
  • split off all-to-all from wait times
  • add number of invocations. Why? Number of reductions is synchronised - only difference is between master and non-master processors. 2) halo swaps are determined by the number of processor boundaries which can be obtained from the boundary file. 3) Interesting is the number of linear-solver sweeps though.

Target audience

Proposal

What does success look like, and how can we measure that?

Links / references

Funding