petsc4foam & foam2csr - Issue with AmgXCSRMatrix::setValuesLDU
Hi!
I am trying to install petsc4Foam using the amgxwrapper branch [git clone --branch amgxwrapper https://develop.openfoam.com/modules/external-solver.git petsc4foam]. I have installed AMGX-2.2.0, OpenFOAM v2112, PETSC-v3.15.5 and foam2csr. (I tried with PETSc-v3.16.6 and v3.16.2)
I changed the solvers/petscSolver.C and solvers/petscSolver.H (downloaded from here) after facing the initialization error for List& lowNonZero = ctx.lowNonZero;
When I run ./Allmake command in the petsc4Foam folder, I encounter the following error.
I could not find any changes made to the amgxSolver.C file post commit #29 (closed) to the petsc4foam main branch. I've tried changing the variable types in the solvers/amgxSolver.C [lines 269-404] but I am not sure if that's the right approach for this.
Would really appreciate some guidance from the developers. @sbna @szampini @mmartineau
solvers/amgxSolver.C:404:5: error: no matching function for call to ‘AmgXCSRMatrix::setValuesLDU(const label&, const label&, Foam::label&, Foam::label&, Foam::label&, const long int*, const long int*, Foam::label&, long int*, long int*, const double*, const double*, const double*, double*)’
);
^
In file included from solvers/amgxSolver.H:46:0,
from solvers/amgxSolver.C:38:
foam2csr/src/AmgXCSRMatrix.H:54:14: note: candidate: void AmgXCSRMatrix::setValuesLDU(int, int, int, int, int, const int*, const int*, int, const int*, const int*, const float*, const float*, const float*, const float*)
void setValuesLDU
^~~~~~~~~~~~
foam2csr/src/AmgXCSRMatrix.H:54:14: note: no known conversion for argument 6 from ‘const long int*’ to ‘const int*’
foam2csr/src/AmgXCSRMatrix.H:74:14: note: candidate: void AmgXCSRMatrix::setValuesLDU(int, int, int, int, int, const int*, const int*, int, const int*, const int*, const double*, const double*, const double*, const double*)
void setValuesLDU
^~~~~~~~~~~~
foam2csr/src/AmgXCSRMatrix.H:74:14: note: no known conversion for argument 6 from ‘const long int*’ to ‘const int*’
Activity
- Vansh Sharma mentioned in issue #25
mentioned in issue #25
Hi @vansh, I have tried petsc4foam with foam2csr with gcc compiler, but I didn't reproduce this error.
I have checked the caller in
solvers/amgxSolver.C:404
, which isAmat.setValuesLDU ( nrows_, nIntFaces_, diagIndexGlobal, lowOffGlobal, uppOffGlobal, &upp[0], &low[0], nProcValues, &procRows[0], &procCols[0], &diagVal[0], &uppVal[0], &lowVal[0], &procVals[0] );
The 6th argument of the caller function is
&upp[0]
, which is ofconst labelUList&
type. As the error indicates the given type isconst long int*
, which is incompatible with the variable typeconst int*
insetValuesLDU
.I think this is probably you have set the
WM_LABEL_SIZE=64
, you can try withWM_LABEL_SIZE=32
, where theconst labelUList&
will beconst int*
type.- Author
Thanks for the reply @li12242. I will re-try this over the weekend. I was thinking its with the architecture because of that data type mismatch, but there wasn't explicit mentioning of switching the default settings in OF.
Also, did you face the initialization error for List& lowNonZero = ctx.lowNonZero; ? The one changed in commit #29 (closed).
I downloaded the amgxwrapper branch of petsc4Foam and I faced this issue. I just went inside the code and made the changes myself. So not sure if that's caused this error. Is it possible if you can share your petsc4foam and foam2csr files?
- Please register or sign in to reply
Hi @vansh. Did you manage to configure petsc4foam and foam2csr? I'm trying, but I'm not succeeding.
- Author
Hi @diegoalexmayer, unfortunately not yet. I am using the main branch with PETSc - v3.17.4 for now. I will look into it again in a few days. I tried changing the architecture as @li12242 suggested but it did not work. Let me know if you make any progress...
Hi @vansh, thanks for your feedback. So far I've only managed to configure PETSc and compile the external solver module (PETSc4FOAM). I'm currently trying to compile foam2csr, but I'm getting the following error:
Starting compile of foam2csr (amgx) with OpenFOAM-v2012 Gcc63 ThirdParty compiler linux64Gcc63DPInt32Opt, with OPENMPI openmpi-4.0.3 prefix = default(user)
nvcc -std=c++14 --compiler-options='-fPIC' -arch=sm_70 -O3 -Isrc -I. -I -I -I -I/include -c src/AmgXCSRMatrix.cu -o Make/linux64Gcc63DPInt32Opt/src/AmgXCSRMatrix.o In file included from src/AmgXCSRMatrix.cu:23:0: src/AmgXCSRMatrix.H:26:17: fatal error: mpi.h: No such file or directory #include <mpi.h> ^ finished compilation. make: * [Make/linux64Gcc63DPInt32Opt/../nvcc:14: Make/linux64Gcc63DPInt32Opt/src/AmgXCSRMatrix.o] Error 1
I believe I'm compiling foam2csr in the wrong directory, but I'm not sure where to compile it. Did you succeed in compiling foam2src? What was the directory where you compiled foam2src?
- Author
@diegoalexmayer yes I was able to compile foam2csr; Its that you need to make sure all the lib paths are set correctly in the Make/options. Looks like you need to set the correct MPI path. Also, make sure you have sourced the OpenFoam/etc/bashrc file.
EXE_INC = \ -I. \ -I$(CUBROOT) \ -I${PETSC_INC} \ -I${AMGX_INC} \ -I${MY_MPI_HOME}/include // <<- set this
Hi @diegoalexmayer, You can try using the following make/options and replacing the correct paths for your system.
I've used this setup successfully for quite a while.
sinclude $(GENERAL_RULES)/mplib$(WM_MPLIB) sinclude $(RULES)/mplib$(WM_MPLIB) include $(OBJECTS_DIR)/../nvcc sinclude $(GENERAL_RULES)/module-path-user /* Failsafe - default to user location */ ifeq (,$(strip $(FOAM_MODULE_LIBBIN))) FOAM_MODULE_LIBBIN = $(FOAM_USER_LIBBIN) endif EXE_INC = \ -I. \ -I/media/simugpu/raid1/Linux/Software/AMGX/include \ -I/home/simugpu/Software/openmpi-4.1.0/include \ $(foreach dir,$(PETSC_INC_DIR),-I$(dir)) LIB_LIBS = \ -lfiniteVolume \ -lmeshTools \ $(foreach dir,$(PETSC_LIB_DIR),-L$(dir)) -lpetsc \ -L/media/simugpu/raid1/Linux/Software/AMGX/build -lamgxsh
Using
mpicc -showme
should give the path to the mpi include directory. This directory should have thempi.h
file that the compiler is looking for.- Author
@WinstonMechanics That's a detailed answer.
On a side note - do you know if there is a flag for mpi that can enable cuda support in mpi, ("cuda aware")? something like - export MPI_USE_CUDA=1.
In general, the AMGx lib error still remains, it's quite hard to understand the architecture issue.
Edited by Vansh Sharma Hi @vansh,
If you are using openmpi you can toggle cuda awareness on via command
export OMPI_MCA_opal_cuda_support=true
However you need to first see if your mpi supports cuda awareness by using
ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
This should return
mca:mpi:base:param:mpi_built_with_cuda_support:value:true
if its possible to turn on cuda support.I wonder if that lib error is caused by an incompatible PETSc configuration. Can you share the configure.log that's generated during PETSc compilation?
Hi @vansh and @WinstonMechanics. I believe that I managed to correctly compile the AMGX, foam2csr, and petsc4foam. However when I run pipeOneD of the https://develop.openfoam.com/modules/external-solver/-/tree/develop/tutorials/basic/laplacianFoam the next error occurs:
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time
Create mesh for time = 0
SIMPLE: no convergence criteria found. Calculations will run for 1 steps.
Reading field T
Reading diffusivity DT
No finite volume options present
Calculating temperature distribution
Time = 0.005
Invalid MIT-MAGIC-COOKIE-1 keyInitializing PETSc [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing package: https://petsc.org/release/install/install/#external-packages [0]PETSC ERROR: Unknown Mat type given: aijcusparse [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.2, Dec 07, 2021 [0]PETSC ERROR: Unknown Name on a named mayer-System-Product-Name by mayer Fri Nov 4 08:17:04 2022 [0]PETSC ERROR: Configure options --prefix=/home/mayer/OpenFOAM/ThirdParty-v2112/platforms/linux64GccDPInt32/petsc-3.16.2 --PETSC_DIR=/home/mayer/OpenFOAM/ThirdParty-v2112/petsc-3.16.2 --with-petsc-arch=DPInt32 --with-clanguage=C --with-fc=0 --with-x=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-shared-libraries --with-64-bit-indices=0 --with-precision=double --download-hypre [0]PETSC ERROR: #1 (closed) MatSetType() at /home/mayer/OpenFOAM/ThirdParty-v2112/petsc-3.16.2/src/mat/interface/matreg.c:99 [0]PETSC ERROR: #2 (closed) MatSetFromOptions() at /home/mayer/OpenFOAM/ThirdParty-v2112/petsc-3.16.2/src/mat/utils/gcreate.c:223 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: #3 (closed) User provided function() at unknown file:0 [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
I believe that I did not PETSc configure correctly to use in GPUs. When I configured to PETSC I used the command: ./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich
What is the correct command to configure PETSc to use GPUs? Can you help me?
Thanks.
Diego
Hi @diegoalexmayer, you can try configuring PETSc with
--with-cuda=1
option. This should enable you to use PETSc with GPU features. You can also add--force
option to ensure that PETSc is reconfigured.Based on the reported error message it looks like the current configuration is still using the default made by the ThirdParty
makePETSC
installation scriptwith-petsc-arch=DPInt32 --with-clanguage=C --with-fc=0 --with-x=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-shared-libraries --with-64-bit-indices=0 --with-precision=double --download-hypre
which doesn't have GPU features toggled on by default.
You can try editing the default
makePETSC
script and sneak--with-cuda=1 \
at a convenient location such as./configure \ ${PETSC_PREFIX:+--prefix="$PETSC_PREFIX"} \ --PETSC_DIR="$PETSC_SOURCE" \ --with-petsc-arch="$archOpt" \ --with-clanguage=C \ --with-fc=0 \ --with-x=0 \ --with-cuda=1 \ $configOpt \
This should be a reasonable starting point for GPU computations.
Hi @WinstonMechanics, thanks for your help. I believe that I configure PETSc correctly to use in GPUs, because I runed the pitzDaily case (#25 (comment 53568)) without error. However when I tried running the tutorial/basic/laplacianFoam/pipeOneD test case the next error occurs:
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time
Create mesh for time = 0
SIMPLE: no convergence criteria found. Calculations will run for 1 steps.
Reading field T
Reading diffusivity DT
No finite volume options present
Calculating temperature distribution
Time = 0.005
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInitializing PETSc [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: The EXACT line numbers in the error traceback are not available. [0]PETSC ERROR: instead the line number of the start of the function is given. [0]PETSC ERROR: #1 (closed) jac->setup() at /home/labsin/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:418 [0]PETSC ERROR: #2 (closed) PCSetUp_HYPRE() at /home/labsin/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:235 [0]PETSC ERROR: #3 (closed) PCSetUp() at /home/labsin/Software/petsc/src/ksp/pc/interface/precon.c:975 [0]PETSC ERROR: #4 KSPSetUp() at /home/labsin/Software/petsc/src/ksp/ksp/interface/itfunc.c:321 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: Unknown Name on a named labsin-XPS-8500 by labsin Mon Nov 21 11:42:37 2022 [0]PETSC ERROR: Configure options --prefix=/home/labsin/Software/petsc/build --with-petsc-arch=DPInt32 --with-fc=0 -with-x=0 --with-debugging=1 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-mpi-dir=/home/labsin/Software/openmpi-4.1.3 --with-shared-libraries --with-64-bit-indices=0 --with-precision=double --download-hypre -with-cuda=1 --CUDAOPTFLAGS=-O3 --with-cuda-arch=61 --download-f2cblaslapack=1 --force [0]PETSC ERROR: #1 (closed) User provided function() at unknown file:0 [0]PETSC ERROR: Checking the memory for corruption. The EXACT line numbers in the error traceback are not available. Instead the line number of the start of the function is given. [0] #1 (closed) PetscAbortFindSourceFile_Private() at /home/labsin/Software/petsc/src/sys/error/err.c:35 [0] #2 (closed) PCSetUp_HYPRE() at /home/labsin/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:235 [0] #3 (closed) PCSetUp() at /home/labsin/Software/petsc/src/ksp/pc/interface/precon.c:975 [0] #4 KSPSetUp() at /home/labsin/Software/petsc/src/ksp/ksp/interface/itfunc.c:321
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Is there any solution for this? I saw that in issue #34 (closed) this was also report by @nanami.
Thanks.
Diego
Hi @diegoalexmayer,
Good to hear you got PETSc working on the GPU.
I did some testing with Hypre and was able to reproduce the error you reported and the one reported in issue #34 (closed).
To make progress, you could try re-configuring and compiling PETSc with
--download-hypre-configure-arguments=--enable-unified-memory
option. This got rid of the11 SEGV: Segmentation Violation
error for me. Also addingmat_type aijcusparse
tofvSolution
in pipeOneD test case was required to make it work. The latest PETSc 3.18.1 version didn't seem to require these, but the earlier versions like 3.16.6 did need both of these.Apparently BoomerAMG GPU support is still somewhat undocumented, see e.g. https://lists.mcs.anl.gov/pipermail/petsc-dev/2022-February/028144.html
To get rid of the error reported in issue #34 (closed) I had to comment out the line 337
PCSetCoordinates(pc, sdim, n, ccPoints.data());
inpetscSolver.C
file and recompile petsc4Foam. I am not completely sure if BoomerAMG has support for geometric coarsening. This feature was probably added as an experimental to feature boost PETSc GAMG performance via geometric multigrid support.- stefano zampini added AMGX label
added AMGX label