Hi,
I compiled psi4 with gcc and mkl without any errors but run into very strange calculation results when using more than one thread.
WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
WARNING: Atomic UHF is not converging! Try casting from a smaller basis or call Rob at CCMST.
==> Iterations <==
Total Energy Delta E RMS |[F,P]|
@DF-RKS iter 1: 11195678258397.54296875000000 1.11957e+13 1.52187e+10 DIIS
@DF-RKS iter 2: 10775794413621.77929687500000 -4.19884e+11 5.16664e+09 DIIS
@DF-RKS iter 3: 41638185149742.94531250000000 3.08624e+13 8.32414e+09 DIIS
@DF-RKS iter 4: 24970036998641.09375000000000 -1.66681e+13 8.10810e+09 DIIS
@DF-RKS iter 5: 11717907640880.83593750000000 -1.32521e+13 9.51800e+09 DIIS
When running this job on one thread it converges normally.
So I looked into it and found out that core.cpython-36m-x86_64-linux-gnu.so is linked to both libiomp5.so and libgomp.so.1.
I managed to link psi4 only to gomp but this lead to very strange calculation-time/thread ratios:
On my 12core/24thread machine I get the following calculation times (in minutes) dependent on the number of threads.
n1 6.50
n2 2.95
n3 1.90
n4 1.60
n5 1.40
n6 1.55
n7 1.57
n8 1.67
n9 1.70
n10 1.80
n11 1.95
n12 2.10
n13 2.17
n14 2.35
n15 2.53
n16 2.82
n17 3.00
so for some reason I get the best performance for this job with 5 threads.
I read about strange behavior when using mkl and gomp together so I want to compile psi4 only with iomp5 but up until now all my attempts failed and always ended in a combination of gomp and iomp5.
In FindMathOpenMP.cmake it is stated that it tries “telling-gcc-to-not-link-libgomp-so-it-links-libiomp5-instead” but this seems not to work in my case.
Is there any manual way to remove libgomp when compiling psi4?