Large CCSD optimization dies in CCDENSITY

kalju · March 4, 2016, 11:07am

Dear Psi4 users,

I am having trouble running some larger CCSD optimization; they die without any helpful hints in CCDENSITY.

I am running these optimizations on a dual Intel E5-2650 v2 server with 96 GB RAM. I am requesting up to 80 GB of RAM in the input and I have plenty of disk space. I am the only user running QM jobs. I have tried different physical scratch disks with the same effect so it is not bad disk, and I don’t think it is bad RAM either. Pretty much every smaller CCSD optimization completes OK. I have tried "Psi4 0.4.16 Driver"and “Psi4 0.4.131 Driver” from miniconda installs.

One particular example is cyclopropane cation in Cs symmetry. An all-electron CCSD/aug-cc-pVTZ optimization with 299 basis functions (in Cs symmetry) generates 27 GB of <ab|cd> integrals and runs happily. An all-electron CCSD/aug-cc-pwCVTZ calculation of the same structure with 338 basis functions generates 45 GB of <ab|cd> integrals and stops in CCDENSITY in the first round of geometry optimization. The last lines of output are:

Energies re-computed from CC density:
-------------------------------------
One-electron energy        =    0.733757964839662
IJKL energy                =    0.108645022149429
IJKA energy                =   -0.001566176800092
IJAB energy                =   -1.348802966570895
IBJA energy                =   -0.336343664069401
CIAB energy                =   -0.005935039502245

and I find in dmesg output: psi4[18277]: segfault at 7f9814728020 ip 0000000006d62547 sp 00007ffdda9bdd18 error 6 in psi4[400000+74f3000]. The most recently accessed file in my scratch directory is either .77 or .107

The same thing happens with oxetane in Cs geometry (input given below) where 35 GB of <ab|cd> integrals is generated using 322 basis functions. When I am running even larger calculations (like protonated methyloxirane in C1) where 397 basis functions give 169 GB of <ab|cd> integrals, Psi4 nicely tells me that there is no memory left in the beginning of CCDENSITY.

I am suspecting that I am trying to do a calculation that does not fit into 80 GB memory limit but I am perplexed because (i) there are no complaints from Psi4 about the lack of memory, and (ii) I am able to carry out the same CCSD optimizations with much less memory using another commonly used QM program.

Any words of wisdom about how to carry out Psi4 CCSD optimizations that generate up to 45 GB of <ab|cd> integrals on a machine with 96 GB of RAM?

Thank you!

Kalju

Sample input:

Oxetane Cs optimization

memory 80 gb

molecule oxetane {
0 1
C 0.000000 -1.067499 -0.067639
C -1.026596 0.059978 0.071429
C 1.026596 0.059978 0.071429
O 0.000000 1.065292 -0.105166
H 0.000000 -1.836963 0.696861
H 0.000000 -1.522356 -1.052531
H -1.799186 0.136630 -0.689767
H -1.476526 0.131350 1.062616
H 1.476526 0.131350 1.062616
H 1.799186 0.136630 -0.689767
}

set globals {
basis aug-cc-pVTZ
freeze_core false
molden_write true
}

set scf {
E_Convergence 1.0e-09
D_Convergence 1.0e-08
}

set ccenergy {
cc_num_threads 6
R_Convergence 5.0e-08
}

set cclambda {
R_Convergence 5.0e-08
}

set optking {
G_Convergence GAU_TIGHT
}

optimize(‘ccsd’)

crawdad · March 4, 2016, 1:44pm

You might try setting

cachelevel 0

in your global {} section for large CC calculations. The density code needs some overhauling for such calculations.

kalju · March 5, 2016, 4:12am

Daniel,

Thanks for the tip … but unfortunately this made no difference, except it now crashed 30 seconds earlier as CCLAMBDA was a bit faster

Kalju

bozkaya · March 8, 2016, 1:25am

Hi Kalju,

I suggest you to use our new DF-CCSD code for analytic gradients. DF-CCSD analytic gradients code is significantly faster than the conventional code, and it requires much lower memory. For your job, a few GB of RAM will be enough. Further, the DF-CCSD has frozen-core gradients, hence you may use it to further speed up. For geometry optimizations use “symmetry c1” option, just below the cartesian coordinates. You can access DF-CCSD using the “qc_module occ” option and you also need to the “scf_type df” option.

Best regards,
Ugur Bozkaya

hokru · March 8, 2016, 7:09am

Hi,

I really enjoy the occ module features. But it is a bit confusing how to properly access it, unless you spend lots of time with the whole manual.
optimize(‘df-ccsd’) does not do it, I assume it relates to the fnocc module in proc.py or so. Putting “qc_module occ” does change nothing by the way.
I know it is due to overlapping features:
http://www.psicode.org/psi4manual/master/proc_py.html#table-managedmethods

The way I make it work is use optimize(‘ccsd’) and set “cc_type df” instead of “qc_module”

kalju · March 8, 2016, 8:28am

Hi Ugur and Hokru,

Splendid advice! It is optimizing well in c1 symmetry with

set globals {
qc_module occ /* I guess this is not needed as per Hokru’s note */
scf_type df
cc_type df
}

optimize(‘ccsd’)

And I like when a program informs me about memory use scenarios upfront:
Memory for high mem Wabef algorithm : 19163.05 MB
I will use the HIGH_MEM Wabef algorithm!

I am bit confused about where keywords that control in DF-CCSD are specified. I first thought that they are set in occ or dfocc but at least R_Convergence is not (I can of course set it in globals). It was not clear from Psi4 output what module took over after SCF. Maybe Ugur can update the code to print some module info to output just before

Options:

ACTIVE => [ ]

Also, is there a way to visualize the DF-CCSD density and/or electrostatic potential corresponding to the DF-CCSD density with Psi4? How does DF-CCSD work with cubeprop_tasks? As a side note, I think the cubeprop manual needs updating as the current Psi4 just fails on example input with:
<type ‘exceptions.TypeError’>: cubeprop() takes at least 1 argument (0 given): File “”, line 30, in

As far as the failure in the conventional CCDENSITY module goes, I tried couple of thing without success. I installed Psi4 from the Git source, trying couple of different GCC compilers (4.8.5 and 4.9.3), couple of different ATLAS/LAPAC combinations, and a few different BOOST libraries. With custom GCC, I needed to add --extra-math-flags=-L/home/kalju/Software/gcc/lib64/lib -lgfortran for a successful build. When I got a working Psi4 executable, it failed at the same place during CCDENSITY but now blaming glibc in dmesg:

psi4: segfault at 7efb1b9d4020 ip 00007f0762fa67d8 sp 00007ffe8fe67538 error 6 in libc-2.17.so[7f0762e5d000+1b6000]
psi4: segfault at 7fa115d68020 ip 00007fad5d33c7d8 sp 00007ffc531a13a8 error 6 in libc-2.17.so[7fad5d1f3000+1b6000]

This leads me to think that maybe the problem is related to some of the memory / thread management issues with this rather old glibc when very large allocations are requested. But I am in no mood to link Psi4 against a custom glibc right now.

As a side note on these source compilations under CentOS 7.2, I could not get Psi4 working when using ATLAS that was linked with lapack 3.6.0 due to undefined references in libqt.a. Also, the compilation with BOOST 1.60 did not end well (No to_python (by-value) converter found for C++ type: boost::shared_ptrpsi::SuperFunctional

bozkaya · March 8, 2016, 8:28pm

Hi Kalju and Hokru,

Thank you for your interest in our codes. We used to use “df-ccsd” keyword to call the DF-CCSD. However, the call names has just been revised. In fact, the name of the module is DFOCC, the OCC is the old module which uses conventional integrals and it is not very efficient.

In order to access DFOCC module you need to “QC_MODULE OCC” unless it is the default module. Most cases, the default module is the conventional code. Since DFOCC module is the default one for CCSD gradients, you may exclude it. However, if you exclude “QC_MODULE OCC” in the case single point energies then another module will be called.

For CC methods of DFOCC module you need to “CC_TYPE DF”. For MP3 and MP2.5 you need to “MP_TYPE DF”.

DF-CCSD Example:
set scf_type df
set qc_module occ
set cc_type df
energy(‘ccsd’)

DF-MP3 example:
set scf_type df
set qc_module occ
set mp_type df
energy(‘mp3’)

Further, in my DF-CCSD code I have two algorithms for the most expensive term (particle-particle ladder term, PPL). These algorithms are HIGH_MEM and LOW_MEM. Both algorithms are very efficient, of course HIGH_MEM more. Hence, if you reduce memory to for example 16 GB it will switch to the other algorithm. So do not worry it will take care! You can run the same job with much less memory, in fact.

For all DFOCC methods, energy convergence is controlled by “E_CONVERGENCE” option and amplitudes are controlled by “R_CONVERGENCE” options. Default values change according to job type. For example, in single point energy computations the default value of E_CONVERGENCE is 6 (10^-6), but in opt and freq computations it is 8. For custom options pay attention to DFOCC module keywords.

You can write the one-electron properties to the molden file in gradient computations. For DFOCC module you need to “OEPROP TRUE” option and “MOLDEN_WRITE TRUE”.

For CCDENSITY, I think it is not related to your system. I remember that I suffered from the same problem when I was try to run some CCSD gradients timings last year.

Best regards,
Ugur.

bozkaya · March 8, 2016, 8:33pm

By the way, you should write “E_Convergence 9” instead of “E_Convergence 1.0e-09”. Similarly you need to adjust the R_Convergence option.

kalju · March 14, 2016, 9:20am

Hi Ugur,

Thanks for helpful tips. I am happy to report back that DF-CCSD optimizations are going well; the HIGH_MEM option makes a very nice difference for DF-CCSD and DF-CCSD-Lambda parts.

I am still stuck with not getting molden output in DFOCC when doing DF-CCSD calculations with Psi4 0.4.156. I can punch orbital-optimized or non-optimized MOs (i.e. orb_opt false) from DF-OO-MP2 and DF-OO-MP3 calculations to a molden file. However, when I run DF-CCSD (input below), the molden file is only created in the SCF part but not in DF-CCSD part.

Methane molden test

molecule methane {
0 1
C 0.000000000000 0.000000000000 0.000000000000
H 0.904813335088 0.000000000000 0.639799644949
H -0.904813335088 0.000000000000 0.639799644949
H -0.000000000000 0.904813335088 -0.639799644949
H -0.000000000000 -0.904813335088 -0.639799644949
symmetry c1
}

set globals {
scf_type df
qc_module occ
cc_type df
basis sto-3g
}

set scf {
molden_write = true
}

set dfocc {
wfn_type DF-CCSD
orb_opt false
oeprop true
occ_orbs_print true
molden_write true
}

grad = gradient(‘ccsd’)

Looking at the source code, I see that MOLDEN_WRITE is checked in occ_iterations.cc, which is not invoked during DF-CCSD as there is no orbital optimization going on. Am I missing something?

P.S. I believe that I can just increase the print level, get the MO occupancies from the output file and replace “2.0” and “0.0” in front of SCF orbitals and use these to calculate CCSD density later. The MO coefficients should be exactly the same for SCF and CCSD calculations, just different occupancies, right?

P.S.2. The following simple DF-CCSD input file consistently dumps core during the DFPDM step on my system. The crash is related to a meaningless “freeze_core true” statement; I think Psi4 should be able to handle such input more gracefully.

H2

molecule h2 {
0 1
H 0.0000 0.0000 0.0000
H 0.0000 0.0000 0.7414
symmetry c1
}

set globals {
scf_type df
qc_module occ
cc_type df
basis sto-3g
freeze_core true
}

set dfocc {
wfn_type DF-CCSD
orb_opt false
}

grad = gradient(‘ccsd’)