PSI4 on CentOS cluster

Dear PSI4 users!

I’ve encountered some problem that I cannot solve. I compiled PSI4 from git on our University’s cluster (CentOS 7). The Cmake settings are:

cmake3 -H. -Bbuild
-DCMAKE_INSTALL_PREFIX=“$HOME/psi”
-DBUILD_SHARED_LIBS=ON
-DENABLE_CheMPS2=OFF
-DENABLE_gdma=ON
-DENABLE_dkh=OFF
-DENABLE_libefp=OFF
-DENABLE_erd=OFF
-DENABLE_simint=OFF
-DENABLE_PCMSolver=OFF
-DMAX_AM_ERI=8
-DPYTHON_EXECUTABLE=“/usr/bin/python2.7”
-DPYTHON_LIBRARY=“/usr/lib64/libpython2.7.so”
-DPYTHON_INCLUDE_DIR=“/usr/include/python/python2.7/”
-DCMAKE_C_COMPILER=icc
-DCMAKE_CXX_COMPILER=icpc
-DCMAKE_Fortran_COMPILER=ifort
-DENABLE_MPI=OFF
-DENABLE_MP=ON

Configuration and compilation are completed correctly, ctest passed with all PASSED. I installed PSI4 into my home dir. Some environments I added to .bashrc:

 export PSIPATH=$HOME/psi"
 export PSIDATADIR="$PSIPATH/share/psi4"
 export PSI_SCRATCH="$HOME/PSI4"
 export PYTHONPATH="$PYTHONPATH:$PSIPATH/lib"

It’s OK when I try to launch the test job from the public server. But as I try to use cluster nodes via SLURM, I receive the next message:

Traceback (most recent call last):
File “/home/ovsyannikov_d/psi/bin/psi4”, line 158, in
import psi4
File “/home/ovsyannikov_d/psi/lib//psi4/init.py”, line 80, in
from .driver import endorsed_plugins
File “/home/ovsyannikov_d/psi/lib//psi4/driver/init.py”, line 30, in
from . import dependency_check
File “/home/ovsyannikov_d/psi/lib//psi4/driver/dependency_check.py”, line 44, in
raise ImportError(msg)
ImportError:
NumPy is a runtime requirement for Psi4. Please install NumPy to proceed.

So if I ssh to the cluster node and launch the job, I receive the next message:

Traceback (most recent call last):
File “/home/ovsyannikov_d/psi/bin/psi4”, line 158, in
import psi4
File “/home/ovsyannikov_d/psi/lib//psi4/init.py”, line 58, in
raise ImportError(“{0}\nLikely cause: GCC >= 4.9 not in [DY]LD_LIBRARY_PATH”.format(err))
ImportError: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8’ not found (required by /home/ovsyannikov_d/psi/lib//psi4/core.so)

Well, but on nodes there are no numpy, no modules, no *-devel files. Same error I get as I ssh on node and launch import psi4 in python shell.
Is it any cmake options I missed, or anything I did incorrectly? Or, PSI4 is not designed for clusters? Any suggestion appreciated. Thank you in advance.

At first guess: /usr/ usually points to the local file system so while Python on your head node has NumPy your compute nodes do not. Is there some sort of Python module that you can load so that Python comes from shared resources and not local?

Another thing to try is which python while on a compute node to check out where the local Python is coming from, its its pointing somewhere besides /usr/ you can try python psi4 input.dat which will launch python without the /usr/.../python shebang.

On compute nodes there is no numpy module. So I tried to copy it from /usr to $HOME with no success. Numpy works, but PSI4 doesn’t. With same error, GCC>=4.9 …

which pyhton on compute node returns /usr/bin/python, so it’s OK.

Maybe, I have to provide libstdc++ via my $HOME dir?

I trust you’re on Psi4 1.1 (late May 2017 or later)?

Some oddities about the compilation:

  • there is no ENABLE_MPI b/c no MPI
  • OpenMP triggered by ENABLE_OPENMP but nevermind b/c its on by default
  • PSIPATH doesn’t do anything for a default installation. Were you thinking about adding $HOME/psi/bin to envvar PATH?
  • no setting PSIDATADIR unless you’re an expert developer. this is admittedly a contrast from 1.0
  • PSI_SCRATCH you want pointing to local hard disk or wherever your cluster wants scratch written.

Now more generally, I agree with Daniel that your cluster really must provide a way to make support software like numpy and gcc libraries on compute nodes. One thing to do is run ldd $HOME/psi/lib/psi4/core.so on the login node and then again on the compute node to see what’s not being found.

If your univ cluster is as peculiar and software-sparse as you describe, you may want to use the conda binary (http://psicode.org/psi4manual/master/conda.html#how-to-install-a-psi4-binary-into-an-ana-miniconda-distribution). This will bring along numpy and not need libstdc++.

Thank you for your reply!

Yes, I just cloned PSI4 from github last friday. So I believe it’s 1.1 version.

I tried ldd PSI4 core.so from some node and get the next message:

./core.so: /usr/lib64/libstdc++.so.6: version CXXABI_1.3.8' not found (required by ./core.so) ./core.so: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.20’ not found (required by ./core.so)
./core.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21’ not found (required by ./core.so)
linux-vdso.so.1 => (0x00007fffe5e37000)
libxc.so => /common/home/ovsyannikov_d/psi/lib/psi4/./…/libxc.so (0x00002b908bc35000)
libgdma.so => /common/home/ovsyannikov_d/psi/lib/psi4/./…/libgdma.so (0x00002b908bf52000)
libderiv.so => /common/home/ovsyannikov_d/psi/lib/psi4/./…/libderiv.so (0x00002b908c202000)
libint.so => /common/home/ovsyannikov_d/psi/lib/psi4/./…/libint.so (0x00002b909a4d1000)
libmkl_rt.so => /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64/libmkl_rt.so (0x00002b909d511000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00002b909db01000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00002b909dd1d000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00002b909e01f000)
libimf.so => not found
libsvml.so => not found
libirng.so => not found
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b909e223000)
libiomp5.so => not found
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002b909e52b000)
libintlc.so.5 => not found
libc.so.6 => /usr/lib64/libc.so.6 (0x00002b909e741000)
/lib64/ld-linux-x86-64.so.2 (0x00002b90897e7000)
libimf.so => not found
libsvml.so => not found
libirng.so => not found
libintlc.so.5 => not found
libifport.so.5 => not found
libifcoremt.so.5 => not found
libimf.so => not found
libsvml.so => not found
libiomp5.so => not found
libintlc.so.5 => not found
libimf.so => not found
libsvml.so => not found
libirng.so => not found
libintlc.so.5 => not found
libimf.so => not found
libsvml.so => not found
libirng.so => not found
libintlc.so.5 => not found

And on login node PSI4 doesn’t work for now with the next error:

Traceback (most recent call last):
File “/home/ovsyannikov_d/psi/bin/psi4”, line 158, in
import psi4
File “/home/ovsyannikov_d/psi/lib//psi4/init.py”, line 80, in
from .driver import endorsed_plugins
File “/home/ovsyannikov_d/psi/lib//psi4/driver/init.py”, line 31, in
from psi4.driver.molutil import *
File “/home/ovsyannikov_d/psi/lib//psi4/driver/molutil.py”, line 36, in
from psi4.driver.inputparser import process_pubchem_command, pubchemre
File “/home/ovsyannikov_d/psi/lib//psi4/driver/inputparser.py”, line 44, in
from psi4.driver import pubchem
File “/home/ovsyannikov_d/psi/lib//psi4/driver/pubchem.py”, line 53, in
from urllib.request import urlopen, Request
ImportError: No module named request

Conda’s binary works well… but why? :confused: maybe, I can link conda’s libraries to the psi4 bin compiled by me? I am so confused…

If you want to get the psi4 you compiled yourself working on the remote nodes, you’ll have to find out how your cluster wants to deliver “system” software to the nodes. Modules, e.g., module list, are a common way, in which case you’ll have to select enough modules to satisfy all those ldd "not found"s. You may have to request through PBS (or whatever your queue) that your jobs be routed to nodes where modules are available.

I thought you said earlier that jobs were working just fine on the cluster head node? The request import error suggests its using a different python than compiled with. You can see the one compiled with as the top line of $HOME/psi/bin/psi4.

The conda works because it brings its dependencies along with it, including python and libgcc_s. It only looks to / for very basic libraries, so it’s not hitting the bleak software landscape of your cluster’s compute nodes.

Thank you for explanation.

I faced with another problem: conda’s binary doesn’t work in parallel. I tried to export OMP_NUM_THREADS=16, using command psi4 -n 16, put in input set_num_threads(16) with no effect. Psi4 uses only 1 thread on login node and on compute nodes. If I ssh to compute node and start psi4 from it, the same effect: only 1 process with name ‘psi4’. Ldd conda’s core.so said all libraries are present. Is it some kind of OpenMP error? And what if I send psi4 task to more than 1 node? Would it use all the nodes?

Negative on sending psi4 task to more than 1 node. We’re strictly intra-node parallel.

We’ve had a few issues with math libs and parallelism lately here and here. What’s the version of your binary (psi4 --version)? And does running this show speedup? And what’s the output of python thread.py and psi4 thread.py below?

thread.py

import os
import time

# none for psithon

# good psiapi
import numpy as np
import psi4

# bad psiapi
#import psi4
#import numpy as np

def test_threaded_blas():
    threads = 6

    times = {}

    size = [200, 500, 2000, 4000]
    threads = [1, threads]

    for th in threads:
        psi4.set_num_threads(th)

        for sz in size:
            nruns = max(1, int(1.e10 / (sz ** 3)))

            a = psi4.core.Matrix(sz, sz)
            b = psi4.core.Matrix(sz, sz)
            c = psi4.core.Matrix(sz, sz)

            tp4 = time.time()
            for n in range(nruns):
                c.gemm(False, False, 1.0, a, b, 0.0)

            retp4 = (time.time() - tp4) / nruns

            tnp = time.time()
            for n in range(nruns):
                np.dot(a, b, out=np.asarray(c))

            retnp = (time.time() - tnp) / nruns
            #retnp = 1.0
            print("Time for threads %2d, size %5d: Psi4: %12.6f  NumPy: %12.6f" % (th, sz, retp4, retnp))
            if sz == 4000:
                times["p4-n{}".format(th)] = retp4
                times["np-n{}".format(th)] = retnp
                assert psi4.get_num_threads() == th

    rat1 = times["np-n" + str(threads[-1])] / times["p4-n" + str(threads[-1])]
    rat2 = times["p4-n" + str(threads[0])] / times["p4-n" + str(threads[-1])]
    print("  NumPy@n%d : Psi4@n%d ratio (want ~1): %.2f" % (threads[-1], threads[-1], rat1))
    print("   Psi4@n%d : Psi4@n%d ratio (want ~%d): %.2f" % (threads[0], threads[-1], threads[-1], rat2))

    os.system('grep mkl /proc/%d/maps' % os.getpid())


if __name__ == '__main__':
    test_threaded_blas()

The thread.py gives:

Threads set to 1 by Python driver.
Time for threads 1, size 200: Psi4: 0.000932 NumPy: 0.001225
Time for threads 1, size 500: Psi4: 0.013115 NumPy: 0.009239
Time for threads 1, size 2000: Psi4: 1.074149 NumPy: 0.544688
Time for threads 1, size 4000: Psi4: 6.042241 NumPy: 3.203193
Threads set to 6 by Python driver.
Time for threads 6, size 200: Psi4: 0.000172 NumPy: 0.000897
Time for threads 6, size 500: Psi4: 0.002217 NumPy: 0.007174
Time for threads 6, size 2000: Psi4: 0.153210 NumPy: 0.416190
Time for threads 6, size 4000: Psi4: 1.145244 NumPy: 3.102587
NumPy@n6 : Psi4@n6 ratio (want ~1): 2.71
Psi4@n1 : Psi4@n6 ratio (want ~6): 5.28
7f868b4ee000-7f868d982000 r-xp 00000000 4f9:2c566 144117035951996534 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_avx.so
7f868d982000-7f868db82000 —p 02494000 4f9:2c566 144117035951996534 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_avx.so
7f868db82000-7f868db88000 r–p 02494000 4f9:2c566 144117035951996534 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_avx.so
7f868db88000-7f868db91000 rw-p 0249a000 4f9:2c566 144117035951996534 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_avx.so
7f868db91000-7f868e494000 r-xp 00000000 4f9:2c566 144117035951996541 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f868e494000-7f868e694000 —p 00903000 4f9:2c566 144117035951996541 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f868e694000-7f868e695000 r–p 00903000 4f9:2c566 144117035951996541 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f868e695000-7f868e6a9000 rw-p 00904000 4f9:2c566 144117035951996541 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f868e6af000-7f868fd3d000 r-xp 00000000 4f9:2c566 144117035951996542 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f868fd3d000-7f868ff3c000 —p 0168e000 4f9:2c566 144117035951996542 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f868ff3c000-7f868ff3f000 r–p 0168d000 4f9:2c566 144117035951996542 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f868ff3f000-7f8690115000 rw-p 01690000 4f9:2c566 144117035951996542 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f869011b000-7f869198e000 r-xp 00000000 4f9:2c566 144117035951996538 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_core.so
7f869198e000-7f8691b8e000 —p 01873000 4f9:2c566 144117035951996538 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_core.so
7f8691b8e000-7f8691b96000 r–p 01873000 4f9:2c566 144117035951996538 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_core.so
7f8691b96000-7f8691bb7000 rw-p 0187b000 4f9:2c566 144117035951996538 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_core.so
7f8698ad9000-7f8698eb2000 r-xp 00000000 4f9:2c566 144117035951996545 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f8698eb2000-7f86990b1000 —p 003d9000 4f9:2c566 144117035951996545 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f86990b1000-7f86990b7000 r–p 003d8000 4f9:2c566 144117035951996545 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f86990b7000-7f86990b8000 rw-p 003de000 4f9:2c566 144117035951996545 /common/intel_2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_rt.so

The psi4 version is 1.1. The tui1.py script gives next results:
time psi4 tui1.py
real 0m10.852s
user 0m8.706s
sys 0m0.702s

and with parallelize time psi4 tui.py -n 16
real 0m7.898s
user 0m58.963s
sys 0m2.168s

It seems like real time have speed up. Thank you for help!

1.1 binary uses baked-in MKL that we’re not sure adjusts with processor instruct sets. You might get a little better out of the nightly build (-c psi4/label/dev) which does dynamic linking the the runtime lib. But looks like you’re up and running.

Typically, we run 4–8 threads, if you’re considering putting more than one job on a node.