Keeping in line with the good advice from Jonathon on a different question of mine, I’ve run some tests to try and pin down the issue.
I cannot publicly disclose specifics about the molecules themselves (test
is a placeholder), but they have between 30-100 atoms and NBF < 2000 and NAUX < 4000 with def2-QZVPP. There is nothing particularly unusual about the molecules, I can discuss a specific example privately if it comes down to it.
Using the following input (all other examples below only have the amount of memory changed):
with open('test.revpbe.xyz') as f:
test_xyz = f.read()
test = psi4.core.Molecule.from_string(test_xyz, dtype='xyz')
activate(test)
memory 64 GB # 4 GB x 16 cores
set reference rks # Happens with open-shell cases, too.
set basis def2-qzvpp
E_dubhyb, wfn = energy('dsd-blyp', return_wfn=True)
The job fails and throws an OUT_OF_MEMORY
error even though it used 0% of the memory (is this a slurm error or indicative of a bug in psi4?)
$ seff 4759647
Job ID: 4759647
Cluster: cedar
User/Group: gibacic/gibacic
State: OUT_OF_MEMORY (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:20:36
CPU Efficiency: 42.68% of 00:48:16 core-walltime
Job Wall-clock time: 00:03:01
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 64.00 GB (4.00 GB/core)
However, with more memory and fewer cores, it uses all the memory efficiently before running out at the DF-MP2 step.
[...]
memory 120 GB # 60 GB x 2 cores
[...]
has the following efficiency:
$ seff 4765160
Job ID: 4765160
Cluster: cedar
User/Group: gibacic/gibacic
State: OUT_OF_MEMORY (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:29:50
CPU Efficiency: 93.35% of 01:36:14 core-walltime
Job Wall-clock time: 00:48:07
Memory Utilized: 119.30 GB
Memory Efficiency: 99.42% of 120.00 GB
Giving it even more (i.e., just enough) memory allows the the job to complete normally.
[...]
memory 160 GB # 80 GB x 2 cores
[...]
$ seff 4760349
Job ID: 4760349
Cluster: cedar
User/Group: gibacic/gibacic
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 03:02:16
CPU Efficiency: 96.44% of 03:09:00 core-walltime
Job Wall-clock time: 01:34:30
Memory Utilized: 151.40 GB
Memory Efficiency: 94.62% of 160.00 GB
Here’s the list of the modules loaded during testing in case there is an obvious compatibility issue:
$ module list
Currently Loaded Modules:
1) CCconfig
2) gentoo/2020 (S)
3) gcccore/.9.3.0 (H)
4) imkl/2020.1.217 (math)
5) intel/2020.1.217 (t)
6) ucx/1.8.0
7) libfabric/1.10.1
8) openmpi/4.0.3 (m)
9) StdEnv/2020 (S)
10) libxc/4.3.4 (chem)
11) libffi/3.3
12) python/3.8.2 (t)
13) ipykernel/2020b
14) scipy-stack/2020b (math)
15) psi4/1.3.2 (chem)
16) dftd3-lib/0.9
Where:
S: Module is Sticky, requires --force to unload or purge
m: MPI implementations / Implémentations MPI
math: Mathematical libraries / Bibliothèques mathématiques
t: Tools for development / Outils de développement
chem: Chemistry libraries/apps / Logiciels de chimie
H: Hidden Module
Please let me know if you need more information or if there are other tests to run to figure what is going on.