Parallel contiguous data synchronisation not scaling
Functionality to add/problem to solve
Parallel (face/edge/point) synchronisation uses the syncTools helper functions. It assumes non-contiguous data so uses streaming to extract the data, followed by an exchange of sizes before sending the actual data. It is this exchange of sizes (all-to-all) that causes scaling issues. For contiguous data and one-to-one mapping we know in advance the amount of data to receive so can skip this.
Target audience
- parallel running on larger number of processors
Proposal
- check for contiguous data in templated functions using PstreamBuffers
Links / references
This work is based on the work done in
"Communication Optimization for Multiphase FlowSolver in the Library of OpenFOAM"
Zhipeng Lin1, Wenjing Yang1,*, Houcun Zhou2, Xinhai Xu3, Liaoyuan Sun1, Yongjun Zhang3and Yuhua Tang
(MDPI "Water" magazine, October 2018)
it shows two bottlenecks:
- linear solver using blocking allreduce
- MULES finding out sizes to receive every sweep
The first item was tackled in the
v2006
pipelined CG solver implementation. The current issue is a more general fix of the second item.
Funding
(Does the functionality already exist/is sponsorship available?)
Edited by Mattijs Janssens