Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • openfoam openfoam
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 379
    • Issues 379
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 13
    • Merge requests 13
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Development
  • openfoamopenfoam
  • Issues
  • #1379

Closed
Open
Created Jul 18, 2019 by Admin@OpenFOAM-adminMaintainer

Openfoam 1812 over Infiniband

Hello evryone,

Summary

I am trying to run OpenFoam 1812 over Infiniband (Mellanox) with OpenMPI 4 but it crashes at launch, I wonder if it is a compatibility issue between openfoam and openmpi. With a simple C code I can use openmpi over infiniband (I am not exchanging data with this code though)

Steps to reproduce

I have an openFoam case and use the following command: foamJob -p -s snappyHexMesh I add the following to my bashrc: export OMPI_MCA_btl_openib_allow_ib=1 export OMPI_MCA_btl_openib_if_include="mlx5_1:1"

Environment information

OpenFOAM version : v1812 Operating system : ubuntu 18.04 Hardware info : infiniband Mellanox Compiler : gcc

Possible fixes

I am using OpenMPI 4, but Openfoam doesn't accept it so I add a link from libmpi.so.40 to libmpi.so.20 because Openfoam is looking for the v2, but it is using the v4 with the link, for a calculation only on one server it is working perfectly (v4 faster than v2)

I tried to switch back to v2 but if I install it I get: [maui:16468] PMIX ERROR: UNPACK-PAST-END in file ../../../../../../../../../../opal/mca/pmix/pmix3x/pmix/src/mca/bfrops/v12/unpack.c at line 206

What is the current bug behaviour?

I get a segmentation fault from snappyHexMesh

cws@maui:~/Molokai/bench/run_32$ foamJob -p -s snappyHexMesh
Parallel processing using SYSTEMOPENMPI with 2 processors
Executing: /opt/openfoam1812/OpenFOAM-v1812/bin/mpirun -np 2 -hostfile hostfile -x FOAM_SETTINGS /opt/openfoam1812/OpenFOAM-v1812/bin/foamExec snappyHexMesh -parallel | tee  log
[maui:22883] Warning: could not find environment variable "FOAM_SETTINGS"
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: oahu
--------------------------------------------------------------------------
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  v1812                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : v1812 OPENFOAM=1812
Arch   : "LSB;label=32;scalar=64"
Exec   : snappyHexMesh -parallel
Date   : Jul 18 2019
Time   : 10:43:48
Host   : maui
PID    : 22891
I/O    : uncollated
[maui:22891:0:22891] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
    0  /usr/lib/libucs.so.0(+0x1ec4c) [0x7fc7ab995c4c]
    1  /usr/lib/libucs.so.0(+0x1eec4) [0x7fc7ab995ec4]
===================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node maui exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------```
Assignee
Assign to
Time tracking