reduce IO when reading processor directories (on startup)
As highlighted during the 26-Nov meeting (JPN), they observed a severe lag when starting up with a large number of processors (eg, 10k+).
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Link issues together to show that they're related. Learn more.
Activity
- Mark OLESEN changed milestone to %v2012
changed milestone to %v2012
- Mark OLESEN added confirmed enhancement labels
added confirmed enhancement labels
- Author Maintainer
They could patch around the problem by essentially bypassing
fileOperation::lookupAndCacheProcessorsPath
entirely. Discussed and analyzed the problem with @Mattijs. The initial patch does solve the immediate problem, but would prevent things like restarting from collated -> uncollated format.We concluded that the real culprit is the system readdir() (wrapped as Foam::readDir()) being called on all processors when analyzing the names in lookupAndCacheProcessorsPath. As long as we are running without distributed roots (ie, shared filesystem) we can call the readdir on the master only and redistribute the names to the sub-procs. The parsing for processor directories:
^processor(?:(\d+)|(?:s(\d+)(?:_(\d+)-(\d+))?))$
is a simple string check and can be applied on all processors.
Edited by Mark OLESEN - Author Maintainer
Updates for @Azami
- Mark OLESEN made the issue visible to everyone
made the issue visible to everyone
- Mark OLESEN made the issue confidential
made the issue confidential
- Mark OLESEN mentioned in commit 627d79db
mentioned in commit 627d79db
- Mark OLESEN mentioned in commit 249314c0
mentioned in commit 249314c0
- Mark OLESEN made the issue visible to everyone
made the issue visible to everyone
- Author Maintainer
Hi @Azami - I finalized the changes that I believe will have the same effect as your suggestions, but fixed in a different place. As mentioned in the comment above, we wish to retain the ability to change between different output formats (eg, when converting results or restarting from a different method).
To enable quick testing/confirmation from your side, I have made an additional commit (249314c0 or branch issue-1946-fileOperations-v1812) as a backport with just that particular change for v1812. You only need the fileOperation.C file.
Please let us know if our changes also resolve the issues, or if we need to adjust them.
- Author Maintainer
See note on #1953 (closed)
- Mark OLESEN mentioned in commit 3f55626c59799d222081dc848fcdc250f3013761
mentioned in commit 3f55626c59799d222081dc848fcdc250f3013761
- Mark OLESEN mentioned in commit 4f50e605
mentioned in commit 4f50e605
- Mark OLESEN mentioned in issue #2027 (closed)
mentioned in issue #2027 (closed)
- Mark OLESEN mentioned in commit 201f117f
mentioned in commit 201f117f
- Author Maintainer
Closing out with update starting times. Thank you @Azami!
- channelflow 24M PCG-DIC (fixed {p}maxIter=300)
- initTime(starting up time : elapse(sec)) of rank0
config 3072 procs 6144 procs 12288 procs 24576 procs 1812 212 729 2441 - 1812 (RIST patch) 7 27 41 - v2012(uncollated) - - 33 80 - Mark OLESEN closed
closed