Calculations fail for "larger" systems

Dear PSI4 community,

we are interested in performing CCSD(T) calculations with large (5Z/6Z) basis sets and experience some problems with PSI4. While small systems (e.g., H2) works fine with 6Z-basis sets, slightly larger systems (e.g., NH3) fail already for QZ basis sets. Based on this experience, I assume that this might be due to some issues with file sizes and/or the scratch diretory and I would be very grateful for any advice on how to overcome this problem.

I have defined the path to the scratch directory using export PSI_SCRATCH=/scratch/myusername.

The input file looks like this and runs on 8 CPUs (similar problems occur with only 4 CPUs and less memory):

#! NH3 SP

memory 19 GB

molecule NH3 {
0 1
N 0.00000000 0.00000000 0.11400700
H 0.00000000 0.93809600 -0.26601700
H -0.81241500 -0.46904800 -0.26601700
H 0.81241500 -0.46904800 -0.26601700
}

set basis aug-cc-pvqz
energy(‘CCSD(T)’)

The last lines of the output file are:

Size of irrep 0 of tijab amplitudes:       0.394 (MW) /      3.150 (MB)
Size of irrep 1 of tijab amplitudes:       0.178 (MW) /      1.421 (MB)
Total:                                     0.571 (MW) /      4.571 (MB)
Sorting File: A <ij|kl> nbuckets = 1

The following error messages are generated in the cause of this calculations (I replaced my username with USER):

PSIO_ERROR: unit = 103, errval = 12
PSIO_ERROR: Failed to write toclen to unit 104.
Traceback (most recent call last):
File “”, line 36, in
File “/home/USER/Scripts/Miniconda/share/psi/python/driver.py”, line 627, in energy
procedures[‘energy’][lowername](lowername, **kwargs)
File “/home/USER/Scripts/Miniconda/share/psi/python/proc.py”, line 2081, in run_ccenergy
psi4.ccsort()
RuntimeError:
Fatal Error: PSIO Error
Error occurred in file: /scratch/cdsgroup/conda-builds/work/src/lib/libpsio/toclen.cc on line: 111
The most recent 5 function calls were:

psi::PsiException::PsiException(std::string, char const*, int)
psi::PSIO::wt_toclen(unsigned int, unsigned long)
psi::PSIO::tocwrite(unsigned int)
psi::psio_tocwrite(unsigned int)
psi::psio_error(unsigned int, unsigned int)

Any suggestion how to fix this is highly appreciated.

Thank you very much and best regards

Martin

This test case runs for me on my iMac in about six minutes with no problems. I did change the memory for this relatively small calculation to 7GB, as 19GB is far more than required – plus my Mac doesn’t have that much core.

An inability to write the table of contents to a file likely arises either from a memory leak or a more serious I/O problem. The error you’ve encountered seems almost like a hardware problem.

FYI, when you get past this error and try larger CC calculations, you may want to set “cachelevel 0” which should prevent potential memory fragmentation problems.

-TDC

1 Like

Thank you very much for your reply and for testing my calculation! I just repeated the calculation with 7GB and set cachelevel 0, but it resulted in the same error. I will get in touch with our cluster administration and we will hopefully figure out if there is a hardware problem.

Once again thank you very much!

Martin

Dear all,

sorry to reopen this thread and to bother you again. We thought that we had identified the error that prevented us from running the CCSD(T) calculations described above. Currently, the example runs smoothly and we can also increase the size of the basis set to 5Z and 6Z.
However, when we change the molecule to somewhat larger structures, we again run into problems. The error log reports the following entry:

PSIO_ERROR: unit = 33, errval = 12
PSIO_ERROR: 12 (error writing to file)
PSIO_ERROR: unit = 33, errval = 7
Error in PSIO_WT_TOCLEN()!

Do you know any way to overcome this error?

Thank you very much for your help and best wishes

Martin

Hi,

File 33 stores the two-electron integral… I would guess your job fails during the SCF procedure, and not during CCSD(T) this time ?
Could it be that the scratch disk space gets filled up ? How big is it ?

1 Like

Hi,

thanks for the clarification. File 33 is pretty large (2.8 TB) but there is plenty of disk space left (just to be sure, I will check with the cluster adiministration).
The last few lines of the output-file are copied below:

==> Iterations <==

                       Total Energy        Delta E     RMS |[F,P]|

@DF-RHF iter 1: -1478.72831691130091 -1.47873e+03 4.55182e-02
@DF-RHF iter 2: -909.38607989012371 5.69342e+02 2.34144e-02 DIIS
@DF-RHF iter 3: -1528.62825352677601 -6.19242e+02 2.26323e-02 DIIS
@DF-RHF iter 4: -1694.60822391228885 -1.65980e+02 1.20549e-02 DIIS
@DF-RHF iter 5: -1716.68324768771436 -2.20750e+01 1.02344e-02 DIIS
@DF-RHF iter 6: -1766.02661815488409 -4.93434e+01 5.62487e-03 DIIS
@DF-RHF iter 7: -1783.48252537042936 -1.74559e+01 4.17941e-03 DIIS
@DF-RHF iter 8: -1803.65337887726218 -2.01709e+01 2.47500e-03 DIIS
@DF-RHF iter 9: -1809.35555903634213 -5.70218e+00 1.74981e-03 DIIS
@DF-RHF iter 10: -1812.13630720784386 -2.78075e+00 4.08589e-04 DIIS
@DF-RHF iter 11: -1812.27601679592499 -1.39710e-01 1.63340e-04 DIIS
@DF-RHF iter 12: -1812.30889179384667 -3.28750e-02 6.70963e-05 DIIS
@DF-RHF iter 13: -1812.31417114651595 -5.27935e-03 2.61840e-05 DIIS
@DF-RHF iter 14: -1812.31505145945448 -8.80313e-04 1.23471e-05 DIIS
@DF-RHF iter 15: -1812.31534480340542 -2.93344e-04 6.34980e-06 DIIS
@DF-RHF iter 16: -1812.31545996186105 -1.15158e-04 3.00288e-06 DIIS
@DF-RHF iter 17: -1812.31548831460759 -2.83527e-05 1.45482e-06 DIIS
@DF-RHF iter 18: -1812.31549885984805 -1.05452e-05 6.27688e-07 DIIS
@DF-RHF iter 19: -1812.31550088807580 -2.02823e-06 2.16953e-07 DIIS
@DF-RHF iter 20: -1812.31550110678199 -2.18706e-07 8.56865e-08 DIIS
@DF-RHF iter 21: -1812.31550113472622 -2.79442e-08 2.95095e-08 DIIS
@DF-RHF iter 22: -1812.31550113778940 -3.06318e-09 8.11036e-09 DIIS

DF guess converged.

==> Integral Setup <==

Thanks a lot for your help,

Martin

Yes, it seems to be failing while writing the integral file. My best hypothesis right now is that it may have filled up the scratch space, but I think the error message for that is usually different.

Does anyone knowing more about IWL or PSIO files have a better idea ?