SAPT0 calculation crashes pointer error

bhopshang · June 21, 2023, 12:13pm

Hello,
I am running a SAPT0 calculation on a large complex using node local scratch directory on a compute cluster with SLURM manager. Psi4 1.8 release. The job reserved 128 GB of memory in SLURM queur and provided 90 GB to psi4 - to have some buffer for writing large scratch files. Previously, the job failed when RAM requested was too close to the memory provided to PSI4. The input file is located here

The calculation runs for around 6 hours and then fails with an error message

PSIO_ERROR: unit = 194, errval = 12
munmap_chunk(): invalid pointer
/cm/local/apps/slurm/var/spool/job23188571/slurm_script: line 41: 3984586 Aborted

Further message from seff with job number are
State: FAILED (exit code 134)

Nodes: 1
Cores per node: 64
CPU Utilized: 4-03:10:44
CPU Efficiency: 26.05% of 15-20:43:44 core-walltime
Job Wall-clock time: 05:56:56
Memory Utilized: 106.56 GB
Memory Efficiency: 83.25% of 128.00 GB

I have many such large systems for which I intend to carry out SAPT0 level calculations. I appreciate any advice on solving this problem.
Thank you

loriab · June 21, 2023, 5:06pm

Hi, your input looks good. You could check whether the disk is getting full or use a queue that has local disk resources, if that’s available to you. Also, PSIO errors are sometimes flukes, so you might try running this job again or running another planned job to see if the error persists.

bhopshang · June 21, 2023, 5:21pm

Thanks, I restarted the job with significantly more memory - it’s been running for ~ 5 hours now- fingers crossed! Thank you so much for checking the input - it is highly appreciated.