Nested parallelism?

It seems that whenever I run psi4 it is running with nested parallelism…

When I run a calculation with $OMP_NUM_THREADS=n it the jobs will run with 100*n^2% CPU. Putting OMP_NESTED=FALSE in the submission script does nothing and OMP_THREAD_LIMIT=n puts it to 1 thread whenever it wants to implement nested parallelism. I was wondering if there was an easy fix.

Please use the psi4.set_num_threads(n) interface to change the parallelism of Psi4.

Using this method with declaring any OMP_NUM_THREADS is still getting me %CPU values that max out at 100n^2 % as opposed to 100n %.

I should also mention that in the output file it does say “OpenMP threads: 2” and “Integrals threads: 2”

Hmm, did you compile Psi4 yourself and do you have any other lingering OMP or KMP environmental variables?

I compiled it myself. I also do have the precompiled psi4conda which doesn’t have this issue (but is considerably slower still). I can’t imagine there are any OMP environment variables I am unaware of…

For my gcc+mkl compilation and work environment I need to use:

export MKL_THREADING_LAYER=GNU
export MKL_INTERFACE_LAYER=GNU
export OMP_NESTED="FALSE"  

to avoid nested threading in certain code-parts.
I had the feeling it makes DFT calculations slower, but I never investigated properly.

Doesn’t work. I do have to include KMP_DUPLICATE_LIB_OK=TRUE in my send script though… I’m not sure if that’s relevant to this.

Ah, thats a problem. Looks like you are loading two different OMP runtimes. Nested threading goes very wrong in these cases… might be related to the above problem.

@hokru Do you also load two different OMP runtimes via gomp and libiomp?

I see no indications of it. I certainly don’t force anything.

I assume my problem is gcc and the “new” mkl threading layer not playing nice.
Maybe one day i install clang…

edit: OMP_NESTED=“FALSE” is supposed to be default if undefined, but seems it isn’t. So there is also that.

Right… the nested OFF by default is confusing me a bit as well. Natively OMP should only thread the outer loops unless particular directives within the code are used (which we do not).

Do you have this problem if you do ICC/MKL?

OK… the problem is without the KMP_DUPLICATE_LIB_OK=TRUE setting I get the error message:

“OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library.”

It sounds like I need to recompile the code. My friend also said this issue may have arisen because I first downloaded the psi4conda binary before compiling psi4 from github, With this in mind, do you have any suggestions for recompiling the code?

My intel v15 is too old for the developer’s C++11 enthusiasm :wink:

@bge6: For starting it is perhaps best to build everything from freshly from scratch, ie no psi4-libraries from conda, mkl is ok. Empty install PREFIX, etc. Study the initial cmake output if everything is build from source.

Recompiled and fixed.

I’m also having this issue on a compile-from-source I’m doing with release 1.2. I’ve installed mkl using the academic license in the customary place for linux/ubuntu distributions (it is installed through the .deb intel distributes, if I recall correctly).

Recompiling doesn’t fix my issue on its own; I think that somewhere in the compilation process cmake is finding different libiomp5.so files, and is somehow linking both of them? I’m not sure how to hide the version not coming from my mkl install. Here’s my cmake script:

cmake -H. -Bobjdir \
    -DENABLE_ambit=ON \
    -DENABLE_CheMPS2=OFF \
    -DENABLE_dkh=ON \
    -DENABLE_libefp=ON \
    -DENABLE_erd=OFF \
    -DENABLE_simint=OFF \
    -DENABLE_gdma=ON \
    -DENABLE_PCMSolver=ON \
    -DLAPACK_LIBRARIES="/opt/intel/mkl/lib/intel64/libmkl_rt.so" \
    -DLAPACK_INCLUDE_DIRS="/opt/intel/mkl/include" \
    -DBLAS_TYPE=MKL \
    -DPYTHON_EXECUTABLE="/home/louis/anaconda3/bin/python" \
    -DCMAKE_INSTALL_PREFIX=/home/louis/bin/psi4

Following is a list of the ‘smoke’ tests my build fails:

The following tests FAILED:
7 - casscf-sp (Failed)
20 - cc1 (Failed)
129 - dfmp2-1 (Failed)
191 - fcidump (Failed)
315 - sapt1 (Failed)
339 - scf-property (Failed)
353 - tu1-h2o-energy (Failed)
389 - python-energy (Failed)
407 - dkh-molpro-2order (Failed)
409 - libefp-qchem-qmefp-sp (Failed)
413 - gdma-gdma1 (Failed)
416 - pcmsolver-scf (Failed)
417 - pcmsolver-opt-fd (Failed)

Peculiarly, some tests succeed. Based on the names of the tests, I suspect these are the tests that do not need compiled parallelism to execute, for example ones using psi4-numpy. Here’s an example output from a test that failed:

20/418 Testing: cc1
20/418 Test: cc1
Command: “/home/louis/anaconda3/bin/python” “/home/louis/psi4/tests/runtest.py” “/home/louis/psi4/tests/cc1/input.dat” “/home/louis/psi4/objdir/testresults.log” “false” “/home/louis/psi4” “false” “/home/louis/psi4/objdir/tests/cc1/output.dat” “/home/louis/psi4/objdir/stage/home/louis/bin/psi4/bin/psi4” “/home/louis/psi4/objdir/stage/home/louis/bin/psi4/share/psi4” “/home/louis/psi4/objdir/stage/home/louis/bin/psi4/lib/”
Directory: /home/louis/psi4/objdir/tests/cc1
“cc1” start time: Dec 28 17:15 EST
Output:

OMP: Error #15: Initializing libiomp5.so, but found libomp.so.5 already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Exit Status: infile ( -6 ); autotest ( None ); sowreap ( None ); overall ( 1 )

Test time = 1.00 sec

Test Failed.
“cc1” end time: Dec 28 17:15 EST
“cc1” time elapsed: 00:00:00

I’d be happy to provide any additional information, but without knowing what might be useful it’s hard for me to know what else I should include. I have the entire compile process output logged, if that seems useful (it’s a lot of text so not copying it here).

Is your numpy from conda? It’s safest to have have all the LAPACK and all the OpenMP requirements linking to the same library (Psi4, CheMPS2, libefp, NumPy all need LAPACK and all those plus LAPACK itself need the openmp library). Fortunately, this is easy to do in conda.

Are you using the conda gcc compilers? If so, go to http://vergil.chemistry.gatech.edu/nu-psicode/install-v1.2.1.html, set the options at linux/source/3.6/nightly, and follow the directions. (Paste me any errors, as the attached deps are a little out of date.)

If you’re using your own compilers or you really want to compile all the dependencies yourself, you will need a conda env, e.g., conda create -n p4dev numpy intel-openmp mkl-devel, conda activate p4dev. Then, on your cmake line, add

-DLAPACK_LIBRARIES="${CONDA_PREFIX}/lib/libmkl_rt.so"
-DLAPACK_INCLUDE_DIRS="${CONDA_PREFIX}/include"
-DOpenMP_LIBRARY_DIRS="${CONDA_PREFIX}/lib" \

(above are from https://github.com/psi4/psi4meta/blob/master/conda-recipes/psi4-dev/src/psi4DepsMKLCache.cmake)

That should get everyone using the same mkl_rt and iomp5.

Hi Lori,

I did as you recommended and installed using the conda compiler. I guess in my reading the build from source instructions I’d somehow missed this.

I did not get any errors, and the build passed all its smoke tests, so I’m going to assume it worked.

Thanks for your help!

Great!

Warning is lurking here (search “Because of how link loaders work”). There are so many seemingly innocent ways the trouble can get introduced that controlling the build environment through the conda recommendation is the best sure-fire way to get the omp right.