Psi4 linked to gomp and iomp5

Acisagic · September 26, 2019, 10:18am

Hi,

I compiled psi4 with gcc and mkl without any errors but run into very strange calculation results when using more than one thread.

WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
  ==> Iterations <==

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RKS iter   1: 11195678258397.54296875000000    1.11957e+13   1.52187e+10 DIIS
   @DF-RKS iter   2: 10775794413621.77929687500000   -4.19884e+11   5.16664e+09 DIIS
   @DF-RKS iter   3: 41638185149742.94531250000000    3.08624e+13   8.32414e+09 DIIS
   @DF-RKS iter   4: 24970036998641.09375000000000   -1.66681e+13   8.10810e+09 DIIS
   @DF-RKS iter   5: 11717907640880.83593750000000   -1.32521e+13   9.51800e+09 DIIS

When running this job on one thread it converges normally.
So I looked into it and found out that core.cpython-36m-x86_64-linux-gnu.so is linked to both libiomp5.so and libgomp.so.1.

I managed to link psi4 only to gomp but this lead to very strange calculation-time/thread ratios:
On my 12core/24thread machine I get the following calculation times (in minutes) dependent on the number of threads.

n1 6.50
n2 2.95
n3 1.90
n4 1.60
n5 1.40
n6 1.55
n7 1.57
n8 1.67
n9 1.70
n10 1.80
n11 1.95
n12 2.10
n13 2.17
n14 2.35
n15 2.53
n16 2.82
n17 3.00

so for some reason I get the best performance for this job with 5 threads.

I read about strange behavior when using mkl and gomp together so I want to compile psi4 only with iomp5 but up until now all my attempts failed and always ended in a combination of gomp and iomp5.

In FindMathOpenMP.cmake it is stated that it tries “telling-gcc-to-not-link-libgomp-so-it-links-libiomp5-instead” but this seems not to work in my case.

Is there any manual way to remove libgomp when compiling psi4?

jmisiewicz · September 30, 2019, 4:25pm

This sounds like a question for @loriab.

loriab · September 30, 2019, 5:55pm

You’re quite right that linking Psi4 to both iomp5 and gomp is Very Bad Indeed and can result in violations of linear algebra. A corollary is that Psi4 --> mkl --> iomp5 and numpy --> mkl --> iomp5 must also have the same threading library (iomp5) and lapack (mkl). Since one usually doesn’t build numpy from scratch, Psi4 is obliged to follow numpy’s library choices. Are you building with the psi4-dev conda package from https://admiring-tesla-08529a.netlify.com/installs/v132/ with choices (linux, source, nightly)? That’s set up to be fully mkl and iomp5. It’s behind a bit on gau2grid, but I’ll get that updated soon.

Acisagic · October 9, 2019, 11:49am

Hi,
Thank you for your insight.
I tried to compile it manually using the python library on the machine (pip3), mkl (/opt/intel/…) and gcc9.2.1 because I wanted test the march=znver2 on this machine.
I recompiled numpy with mkl (with edited .numpy-site.cfg and “pip install numpy --no-binary numpy --force-reinstall”) but still get the linking to libgomp.
Are there any other dependencies that I have to eliminate?

Thank you in advance

P.S. I just tested the conda package and there the linking is indeed right.

loriab · October 9, 2019, 11:12pm

You’ve ldd'd one of the compiled libs in numpy to make sure it does link to mkl and doesn’t link to gomp?

The only other psi4 ecosystem items that use lapack are ambit, chemps2, and libefp are off by default, so check they’re off and can’t interfere.

The next thing to look at is what lapack psi4 is detecting. Could you post the cmake configure output? That is, the sizeable amount of text from cmake -H. -B<build> -D.... Maybe uncomment https://github.com/psi4/psi4/blob/master/external/common/lapack/FindTargetOpenMP.cmake#L159-L162 beforehand (may have to delete the Fortran targets from that list).