No child items are currently assigned. Use child items to break down this issue into smaller parts.
Link issues together to show that they're related. Learn more.
Activity
- Author Maintainer
Hi @gregorweiss
In your nonBlockingConsensus you have
MPI_Barrier(MPI_COMM_WORLD)
both at the start and at the end. Did you have synchronization problems that required the barrier? I could guess that one of the problems may be intercepting pending messages not actually meant for the algorithm, in which case it could be that using some presumably unique tag (eg, tag = 314159) might be sufficient.
The barrier on exit looks to be overkill since an Ibarrier was impose just prior. - Author Maintainer
From @gregorweiss (by email)
Ja, tatsaechlich gab es Synchronizations-Probleme. Ich teste das mal mit einem Tag und ohne die Barrier.
- Please register or sign in to reply
- Developer
Hi @mark
thank you for pointing this out. Indeed the blocking barrier on the function exit was placed due to synchronization problems. I remember having checkMesh issues when calling nonblockConsensus in the processor patch construction for the sliceable mesh translation.
Now, removing the blocking barrier on exit, I could not reproduce these issues I had back then. So yes, this barrier was overkill (by now). Still, to prevent intercepting pending messages, I changed to the more specific (and hardcoded) tag = 314159.
The blocking barrier at the start is the race condition fix taken from here: https://scorec.rpi.edu/REPORTS/2015-9.pdf
Edited by Gregor Weiss - Author Maintainer
Interesting read. Will be interesting to see if their "rare" race condition happens if we have tags to separate the messages. Would be nice to get a view of their graph colouring with non-blocking consensus code - could be useful.
Edited by Mark OLESEN