Merged Mattijs Janssens requested to merge feature-cyclicAMI-processorAgglomeration into develop 1 year ago

Support for `cyclicAMI` in processorAgglomeration (inside GAMG)

Problem description:

When running in parallel on many cores the coarsest level solver (e.g. PCG) might become the scaling bottleneck. One way to get around this is to agglomerate coarse level matrices onto less or even one processor. This is collects processors' matrices to create a single, larger matrix with all the inter-processor boundaries replaced with internal faces. This is not supported for other boundary types, e.g. cyclicAMI. When used in combination with processor agglomeration these will display an error message when serialising the boundary, e.g.

[2] --> FOAM FATAL ERROR:
[2] Not implemented
[2]
[2]     From virtual void Foam::cyclicAMIGAMGInterface::write(Foam::Ostream&) const
[2]     in file AMIInterpolation/GAMG/interfaces/cyclicAMIGAMGInterface/cyclicAMIGAMGInterface.H at line 160.
[2]
FOAM parallel run aborting

Solution

The handling of boundaries was generalised to all coupled boundaries (e.g. cyclic, cyclicAMI). All will implement

writing to / constructing from a stream (i.e. serialisation) to collect all parts of a coupled boundary on the agglomerating processor.
cloning (on the agglomerating processor) from the received parts. For cyclicAMI this involves assembling the local face-to-cell addressing from the individual parts and adapting the stencils accordingly. (see below)

Effect

A case with two 20x10x1 blocks coupled through cyclicAMI (decomposed into 4) was compared to a single 40x10x1 block (so using processor boundaries) using the masterCoarsest processor agglomeration (output from running with the -debug-switch GAMGAgglomeration command line option):

processor boundaries:

                              nCells       nInterfaces
   Level  nProcs         avg     max       avg     max
   -----  ------         ---     ---       ---     ---
       0       4         100     100       1.5       2
       1       4          50      50       1.5       2
       2       1         100     100         0       0
       3       1          48      48         0       0

The number of boundaries ('nInterfaces') becomes 0 as all processor faces become internal.

cyclicAMI boundaries:

                              nCells         nInterfaces
   Level  nProcs         avg     max         avg     max
   -----  ------         ---     ---         ---     ---
       0       4         100     100           3       3
       1       4          50      50           3       3
       2       1         100     100           2       2
       3       1          48      48           2       2

Here the number of boundaries goes from 3 to 2 since only the two cyclicAMI get preserved.

Distributed cyclicAMI

A big benefit of cyclicAMI is that the source and target faces do not have to reside on the same processor. This is handled internally using a distribution map:

AMI.srcMap() : transfers the source-side data to the target side.
AMI.tgtMap() : transfers the target-side data to the source side.

When assembling the cyclicAMI interface from the various contributing processors a large part is the assembling of the src and tgt maps. Each map consists of local data (myProcNo) followed by data from the various remote (proci != myProcNo) procs:

index	contents
0	local
localSize
	remote from 0
	remote from 1
	..
	remote from n
constructSize

The data is constructed by

starting from the local data
using the subMap to indicate which elements of this local data need to go where
making additional space to receive remote data
using the constructMap to indicate where the received data slots into

If we start from four processors and combine the processors 0,1 into new 0 and 2,3 into new 1 the assembled layout is agglomerated:

the local data is the agglomeration of local datas so local data on new0 is old0, old1
the remote data is sorted according to originating 'new' processor (so new0 agglomerates the data sent to old procs 0,1 from old procs2,3)
any remote data from assembled processors is removed (since it is now in the assembled local slots)

The two maps indexing the data will be renumbered accordingly. In general most maps will have lots of local data and just a bit of remote (note that this might not be optimal for cyclicAMI purposes since quite likely the two sides get decomposed onto separate processors) so the new numbering is

startOfLocal[mapi] : gives for mapi (assumed to originate from rank mapi) the offset in the assembled data
compactMaps[mapi][index] : gives the mapi the new index for every old index

Notes

cyclicAMI with all faces becoming local will be reset to become non-distributed i.e. directly operating on provided fields without any additional copying.
cyclicAMI with a rotational transformation is not yet supported. This is not a fundamental limitation but requires additional rewriting of the stencils to take into account transformations.
processorCyclic (a cyclic with owner and neighbour cells on different processors) is not yet supported. This is treated as a normal processor boundary so will loose any transformation. Note that processorCyclic can be avoided by using the patches constraint in decomposeParDict, e.g.

constraints
{
    patches
    {
        //- Keep owner and neighbour on same processor for faces in patches
        //  (only makes sense for cyclic patches and cyclicAMI)
        type    preservePatches;
        patches (cyclic);
    }
}

only masterCoarsest has been tested but the code should support any other processor-agglomeration method.

Activity

Mattijs Janssens assigned to @andy 1 year ago

assigned to @andy
Mattijs Janssens added 1 commit 1 year ago
added 1 commit

6ff63f8e - COMP: globalIndex: merge dup

Compare with previous version
Mattijs Janssens added 29 commits 1 year ago
added 29 commits

6ff63f8e...8cd22c1a - 27 commits from branch develop

4ce1f4c9 - ENH: GAMG: processor agglomeration extended for all interfaces

e05df7fb - COMP: globalIndex: merge dup

Compare with previous version
Andrew Heather added 15 commits 1 year ago
added 15 commits

e05df7fb...04e41564 - 13 commits from branch develop

9b7dfcf8 - ENH: GAMG: processor agglomeration extended for all interfaces

1e858c13 - COMP: globalIndex: merge dup

Compare with previous version
Andrew Heather mentioned in commit db29d535 1 year ago

mentioned in commit db29d535
Andrew Heather merged 1 year ago

merged
Mattijs Janssens mentioned in issue #2993 (closed) 1 year ago

mentioned in issue #2993 (closed)
Mattijs Janssens mentioned in commit 05f2d549 1 year ago

mentioned in commit 05f2d549
Mattijs Janssens mentioned in issue #2812 (closed) 1 year ago

mentioned in issue #2812 (closed)

Please register or sign in to reply

ENH: GAMG: processor agglomeration extended for all interfaces

Support for cyclicAMI in processorAgglomeration (inside GAMG)