reducing communication at startup (argList)
In the profiling by @josep.pocurull he noticed a number of seemingly spurious MPI communications occurring in argList (more so in the "extend" variant). I initially didn't think that much could actually be going on there, but indeed we have a few bits of communication happening after MPI_Init():
- all-to-one pattern : collecting the build/hostname/pid
- one-to-many pattern : distributing in the args/options and other flags.
If we completely abandoned the collection of host names (eg, info-switch writeHost = 0), it may be would be possible to push the check for build number consistency into the subranks. Could broadcast the value and test on the sub-ranks, signalling failure from there. But since we'd generally still want some form of information about which hosts are being used, might as well gather everything and process on the master.
For the second set of communication (the one-to-many pattern). This could quite reasonably be replaced by a broadcast, possibly followed by a one-to-many. If we have non-distributed roots, the one-to-many is not needed at all (a simple broadcast is enough). Since this presumably corresponds to a large majority of setups, I'd propose making that change.
Off-topic (perhaps): is JobInfo even still relevant or just leftover?