parallel construct finiteArea with arbitrary connections
Replace the old patch/patch matching style with a more general edge-based synchronisation and matching that appears to handle the corner cases inherently. The internal communication overhead is essentially unchanged, and the logic is simpler.
Merge request reports
Activity
Hi @vaggelisp (and @Sergio). I think that these changes address the issues observed with various decompositions. Please test at the earliest opportunity.
- Resolved by Mark OLESEN
Hi @mark
I am re-attaching a modified version of the test case I used in #2152 (closed)
Before the code update, makeFaMesh was failing in parallel with the error reported in #2152 (closed) for the following decompositions
hierarchical: coeffs (4 2 2) (5 4 2), scotch and kahip with 40 domains
After the code update, makeFaMesh successfully completes with
hierarchical: coeffs (4 2 2), scotch and kahip with 40 domains
so this part of the problem seems to be fixed. Well done :)
However, makeFaMesh still fails for hierarchical: coeffs (5 4 2), with a different error this time (floating point exception in calcPointAreaNormals)
[23] #1 Foam::sigFpe::sigHandler(int) at ??:? [23] #2 ? in /lib64/libpthread.so.0 [23] #3 Foam::faMesh::calcPointAreaNormals() const at ??:? [23] #4 Foam::faMesh::pointAreaNormals() const at ??:? [23] #5 Foam::faBoundaryMesh::calcGeometry() at ??:? [23] #6 Foam::faMesh::faMesh(Foam::polyMesh const&) at ??:? [23] #7 ? at ??:? [23] #8 __libc_start_main in /lib64/libc.so.6 [23] #9 ? at ??:? [sat1:97465] *** Process received signal *** [sat1:97465] Signal: Floating point exception (8) [sat1:97465] Signal code: (-6) [sat1:97465] Failing at address: 0x1f500017cb9 [sat1:97465] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x7fddb7f706d0] [sat1:97465] [ 1] /lib64/libpthread.so.0(raise+0x2b)[0x7fddb7f7059b] [sat1:97465] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x7fddb7f706d0] [sat1:97465] [ 3] /home/foam/OpenFOAM/OpenFOAM-com/platforms/linux64GccDPInt32Opt/lib/libfiniteArea.so(_ZNK4Foam6faMesh20calcPointAreaNormalsEv+0xafe)[0x7fddbc9a263e] [sat1:97465] [ 4] /home/foam/OpenFOAM/OpenFOAM-com/platforms/linux64GccDPInt32Opt/lib/libfiniteArea.so(_ZNK4Foam6faMesh16pointAreaNormalsEv+0x1d)[0x7fddbc992e4d] [sat1:97465] [ 5] /home/foam/OpenFOAM/OpenFOAM-com/platforms/linux64GccDPInt32Opt/lib/libfiniteArea.so(_ZN4Foam14faBoundaryMesh12calcGeometryEv+0x23)[0x7fddbc9fe653] [sat1:97465] [ 6] /home/foam/OpenFOAM/OpenFOAM-com/platforms/linux64GccDPInt32Opt/lib/libfiniteArea.so(_ZN4Foam6faMeshC2ERKNS_8polyMeshE+0x6ac)[0x7fddbc9956dc] [sat1:97465] [ 7] checkFaMesh[0x40309d] [sat1:97465] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fddb7bb6445] [sat1:97465] [ 9] checkFaMesh[0x403883] [sat1:97465] *** End of error message ***
I came across this error with a couple of more geometries, which unfortunately I can't share, which means it occurs occasionally.
I tried to chase down the issue and it appears that it happens in the computation of the normal vector, in boundary points that appear twice in the edgeLoop constructed by their pointFaces (see left part of this sketch for such a case).
In faMeshDemandDrivenData.C, if curPoint appears twice in agglomFacePoints, then slList will also contain curPoint, which eventually leads to a division by zero in lines 988 and 990.
955 labelList agglomFacePoints = curPatch.edgeLoops()[0]; 956 957 SLList<label> slList; 958 959 label curPointLabel = -1; 960 961 for (label i=0; i<agglomFacePoints.size(); ++i) 962 { 963 if (curPatch.meshPoints()[agglomFacePoints[i]] == curPoint) 964 { 965 curPointLabel = i; 966 } 967 else if ( curPointLabel != -1 ) 968 { 969 slList.append(curPatch.meshPoints()[agglomFacePoints[i]]); 970 } 971 } 972 973 for (label i=0; i<curPointLabel; ++i) 974 { 975 slList.append(curPatch.meshPoints()[agglomFacePoints[i]]); 976 } 977 978 labelList curPointPoints(slList); 979 980 for (label i=0; i < (curPointPoints.size() - 1); ++i) 981 { 982 vector d1 = points[curPointPoints[i]] - points[curPoint]; 983 984 vector d2 = points[curPointPoints[i + 1]] - points[curPoint]; 985 986 vector n = (d1 ^ d2)/(mag(d1 ^ d2) + SMALL); 987 988 scalar sinAlpha = mag(d1 ^ d2)/(mag(d1)*mag(d2)); 989 990 scalar w = sinAlpha/(mag(d1)*mag(d2));
Additionally, it might happen that the point belongs to two edge loops, depending on the ordering of the points, and the second loop might be lost since we are only taking the first one into consideration (see previous sketch, right part). I am not sure if this is possible based on how edgeLoops works, just bringing it to your attention.
I have attempted a quick (and potentially dirty) fix to the problem, attached herein (I didn't commit it since I am potentially missing a case in which this does not work or a more elegant way for fixing the problem exists).
faMeshDemandDrivenData_potentialFix.C
Additionally, I believe the calls to the various statistics of the faMesh in checkFaMesh need to be replaced with their parallel counterparts (returnReduce, gMin, gMax, etc), otherwise we are just getting the statistics of the master (see also logs produced by the attached case).
Let me know what you think. If you prefer, I can open a new issue and continue the discussion there.
mentioned in issue #2233 (closed)
mentioned in commit 6730c727
added 17 commits
-
f3a1137d...674a9a87 - 11 commits from branch
develop
- 6a3f9188 - ENH: collective for boundary connections, makes lduAddressing const
- 5014398c - ENH: adjust documentation for face edges, remove unused method
- fc13031a - BUG: incorrect finite-area faceProcAddressing (fixes #2155 (closed))
- 6730c727 - BUG: parallel construct finiteArea fails with arbitrary connections (#2152 (closed))
- 66dab7f6 - ENH: add boundary halo handling to faMesh
- 96a562e1 - ENH: improve handling of area calculations in faMesh (#2233 (closed))
Toggle commit list-
f3a1137d...674a9a87 - 11 commits from branch
mentioned in commit 145cad56
added 7 commits
-
7ad75fa1 - 1 commit from branch
develop
- 76c16434 - BUG: incorrect finite-area faceProcAddressing (fixes #2155 (closed))
- 9b14d4de - BUG: parallel blocking with faFieldDecomposer, faMeshReconstructor (fixes #2237 (closed))
- 145cad56 - BUG: parallel construct finiteArea fails with arbitrary connections (#2152 (closed))
- ee661de4 - ENH: add boundary halo handling to faMesh
- f7fac796 - ENH: improve handling of area calculations in faMesh (#2233 (closed))
- 9c87ddb0 - ENH: improvements for makeFaMesh, checkFaMesh
Toggle commit list-
7ad75fa1 - 1 commit from branch
requested review from @andy
Additional test cases:
- tutorials/finiteArea/surfactantFoam/planeTransport with random decomposition
- motorBike with finiteArea test-motorBike_1.tar.xz
- drivaer with finiteArea (too big)
Ready for more testing @Prashant @swapnilsalokhe
added 2 commits
- 8b64987e - ENH: improve handling of area calculations in faMesh (#2233 (closed))
- 9075af8d - ENH: improvements for makeFaMesh, checkFaMesh
Hi, I've been testing the branch on a case in parallel. The faMesh wass created, but at the end I got the following error:
... Write finite area mesh. [0] [0] [0] --> FOAM FATAL ERROR: (openfoam-2107) [0] cannot find file "/testcase_faMesh-21.06/processor0/constant/polyMesh/faceProcAddressing" [0] [0] From virtual Foam::autoPtr<Foam::ISstream> Foam::fileOperations::uncollatedFileOperation::readStream(Foam::regIOobject&, const Foam::fileName&, const Foam::word&, bool) const [0] in file global/fileOperations/uncollatedFileOperation/uncollatedFileOperation.C at line 542. [0] FOAM parallel run exiting [0] Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
The workflow is as follows:
runApplication blockMesh runParallel redistributePar -constant -overwrite -decompose runParallel snappyHexMesh -overwrite runParallel makeFaMesh
To avoid the error messages above, the automatic decomposition of the faMesh should be avoided. With the proposed change from @mark (see attachment), i.e. avoiding the creation of procAddressing and field decomposition, the faMesh was created in parallel directly after snappyHexMesh without issues. The workflow has been modified accordingly:
runParallel makeFaMesh -no-decompose
- Resolved by Andrew Heather
Changes made locally to include the -no-decompose option, but still waiting from feedback from @swapnilsalokhe if his test cases look OK and if we can proceed with merging soon.
Edited by Mark OLESEN
added 65 commits
-
9075af8d...38bf3016 - 59 commits from branch
develop
- a8092654 - BUG: incorrect finite-area faceProcAddressing (fixes #2155 (closed))
- 9c1f94d4 - BUG: parallel blocking with faFieldDecomposer, faMeshReconstructor (fixes #2237 (closed))
- 8e451089 - BUG: parallel construct finiteArea fails with arbitrary connections (#2152 (closed))
- ea92cb82 - ENH: add boundary halo handling to faMesh
- a95427c2 - ENH: improve handling of area calculations in faMesh (#2233 (closed))
- 7fc943c1 - ENH: improvements for makeFaMesh, checkFaMesh
Toggle commit list-
9075af8d...38bf3016 - 59 commits from branch
mentioned in commit 8e451089
mentioned in commit 6102b763
For @Chiara and @swapnilsalokhe
Please don't worry about the missing "-no-decompose" option in the merged code. It is still staged locally. I was also examining if a '-no-fields' option would make sense, and if we can/should have something similar in decomposePar too.