Hello,
I am running a SAPT0 calculation on a large complex using node local scratch directory on a compute cluster with SLURM manager. Psi4 1.8 release. The job reserved 128 GB of memory in SLURM queur and provided 90 GB to psi4 - to have some buffer for writing large scratch files. Previously, the job failed when RAM requested was too close to the memory provided to PSI4. The input file is located here
The calculation runs for around 6 hours and then fails with an error message
PSIO_ERROR: unit = 194, errval = 12
munmap_chunk(): invalid pointer
/cm/local/apps/slurm/var/spool/job23188571/slurm_script: line 41: 3984586 Aborted
Further message from seff with job number are
State: FAILED (exit code 134)
Nodes: 1
Cores per node: 64
CPU Utilized: 4-03:10:44
CPU Efficiency: 26.05% of 15-20:43:44 core-walltime
Job Wall-clock time: 05:56:56
Memory Utilized: 106.56 GB
Memory Efficiency: 83.25% of 128.00 GB
I have many such large systems for which I intend to carry out SAPT0 level calculations. I appreciate any advice on solving this problem.
Thank you