Skip to content
Snippets Groups Projects

additional topology-aware handling for Pstream

Merged Mark OLESEN requested to merge pstream-topo-aware into develop
  • extend base MPI routines to include a two-stage handling from inter-node to local-node
  • adjust manual algorithms to include two-stage inter-node/local-node handling

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Mark OLESEN changed milestone to %v2506

    changed milestone to %v2506

  • assigned to @Mattijs

  • Author Maintainer

    Would like several extra pairs of eyes on this before releasing in the wild. Not particularly well tested at the moment, but need to discuss some implementation aspects first. (Thus as a draft merge-request).

  • Mark OLESEN added 1 commit

    added 1 commit

    • a7272d0e - CONFIG: add named topoControls

    Compare with previous version

  • Author Maintainer

    For testing purposes on a single machine, can specify the -opt-switch nodeComms=INT (or in etc/controlDict). On startup with everything selected, have this type of information reported:

    Pstream initialized with:
        node communication : on [type=4] (64 ranks, 16 nodes)
        topology controls  : (broadcast reduce gather combine mapGather gatherList)

    With UPstream debugging on, will see this type of output:

    [0] [mpi_allreduce] : op:8 type:5 count:1 comm:0 topo:1
    [0] [mpi_allreduce] : op:8 type:5 count:1 comm:3 stage-1:reduce
    [0] [mpi_reduce] : (inplace) op:8 type:5 count:1 comm:3 topo:0
    [0] [mpi_allreduce] : op:8 type:5 count:1 comm:2 stage-2:allreduce
    [0] [mpi_allreduce] : op:8 type:5 count:1 comm:3 stage-3:broadcast

    Which indicates that the Allreduce (multi-stage) topological handling is being called.

    Similarly, will see this type of output:

    [0] [mpi_broadcast] : type:0 count:1052 comm:0 topo:1
    [0] [mpi_broadcast] : type:0 count:1052 comm:2 substage
    [0] [mpi_broadcast] : type:0 count:1052 comm:3 substage

    Here we can see that the top-level is using comm=0 (world) and the substages are using comm=2 (inter-node) and comm=3 (local-node).

    Edited by Mark OLESEN
  • Mark OLESEN added 15 commits

    added 15 commits

    • a7272d0e...a77aaa75 - 11 commits from branch develop
    • 94611ea5 - ENH: add node-based broadcasting and reduction
    • 8716ca57 - ENH: add node-based gather(), listGather(), mapGather()
    • 4a707fe0 - ENH: add node-based gatherList()
    • 1b8cca18 - CONFIG: add named topoControls

    Compare with previous version

    Toggle commit list
  • Mark OLESEN added 5 commits

    added 5 commits

    • 6dd8804a - 1 commit from branch develop
    • bee32526 - ENH: add node-based broadcasting and reduction
    • b10d47a4 - ENH: add node-based gather(), listGather(), mapGather()
    • d1f44c93 - ENH: add node-based gatherList()
    • 7fd9e1b2 - CONFIG: add named topoControls

    Compare with previous version

    Toggle commit list
  • Mark OLESEN marked this merge request as ready

    marked this merge request as ready

  • Author Maintainer

    Although the limited tests didn't show an performance gains, I would still like to have the changes included. They provide an extra future option and have minimal extra code footprint.

  • Mark OLESEN added 7 commits

    added 7 commits

    • 7fd9e1b2...b9b0d1b3 - 3 commits from branch develop
    • 7b0ab0db - ENH: add node-based broadcasting and reduction
    • c4b261c6 - ENH: add node-based gather(), listGather(), mapGather()
    • a01f3ed8 - ENH: add node-based gatherList()
    • db871856 - CONFIG: add named topoControls

    Compare with previous version

    Toggle commit list
  • Mark OLESEN approved this merge request

    approved this merge request

  • Mattijs Janssens mentioned in commit 4de0b84c

    mentioned in commit 4de0b84c

Please register or sign in to reply