Reuse integrals between jobs

susilehtola · June 2, 2017, 11:03pm

So here’s a silly question. I’m toying with a set of calculations, where I’m trying to find all bound solutions by varying the orbital occupations in the symmetry groups. This gives a ridiculously large number of calculations, and so I’d like to reuse the integrals between the jobs. Is there a nice way to do this? I’d prefer either conventional integrals or Cholesky integrals.

ashutosh · June 4, 2017, 9:07pm

Looks like we don’t have this feature as of now. If you look at void MintsHelper::integrals() function in mintshelper.cc inside libmints which gets called in hf.cc, there are no checks to see if the integrals exist already on the disk. I guess we will have to add some sort of check to read them from disk if their state doesn’t change in any subsequent calculations with a given process-id.

dgasmith · June 5, 2017, 11:06am

Hmm, in the python layer we may be able to check for it easier. If you look in procrouting/proc.py and examine the run_cc* functions you can find a line which checks if the integrals are computed by examining the scf_type. We can probably check to see if the file exist before doing the computation. Susi you can probably do this now by running Psi under the messy flag and altering that function in a local version.

Kirk · December 14, 2017, 9:01pm

Are there any plans to make this type of checking the default? I currently am running a job where the integrals take forever (about 5-6 hours over 4 threads). The integrals seem to be re-calculated for the SCF and also in the subsequent integral transformation before the ccsd(t) step. Contributing to this is probably a mistake on my part by putting an extra scf in my input via:
energy(‘scf’)
energy(‘ccsd(t)’)
energy(‘bccd(t)’)

But in order to get my scf to correctly converge, I first did a calculation on the charged, closed shell molecule and then followed this with the neutral radical (using guess read at the same geometry). This also “cost” me an extra set of integral evaluations. After the CCSD(T) step, the program is now repeating the scf (and integrals) for the bccd(t) job (and presumably integrals yet again for the integral transformation)

Side note: are these integrals so slow because I’m using a large generally contracted basis set? The same calculation in Molpro only takes just over 2 minutes to calculate the integrals (3 cores using MPI). I’m not slamming the performance of the code, just trying to understand the bottleneck.

dgasmith · December 15, 2017, 7:13pm

Well, that sounds bad. I think anything over a factor of two is saying that we are doing something very wrong, much less a factor of 150. Would be able to post the input? We have LibInt, SimInt, and LibERD as possible integral libraries. I dont think any of them take advantage of general contractions. @jturney @andysim might be able to say more.

You can skip the SCF by doing the following:

energy(‘scf’)
scf_e, scf_wfn = energy(‘scf’, return_wfn=True)
energy(‘ccsd(t)’, ref_wfn=scf_wfn)
energy(‘bccd(t)’, ref_wfn=scf_wfn)

Not:e we do SCF twice which is partly my fault as we cannot quite yet start a new SCF computation from a previous one due to the crazy guess mechanics. This is certainly on the TODO list.

Kirk · December 15, 2017, 9:20pm

That’s why I was wondering if perhaps I’ve done something stupid in my build, although I’m using gcc 6.2 and it definitely looks like it picked up my MKL libraries. I’ve posted the input and basis sets used (I was forced though to change the .gbs basis set extension to .dat). Since you mentioned the SCF guess stuff, I’ve had two SCFs using identical inputs result in different solutions when using guess read. The “bad” one shows:

@ROHF iter 0: -28148.52605763907923 -2.81485e+04 0.00000e+00
@ROHF iter 1: -28145.45201529635597 3.07404e+00 4.47619e-02
@ROHF iter 2: -28148.79220221935248 -3.34019e+00 1.91696e-02 DIIS
@ROHF iter 3: -28148.89733059309219 -1.05128e-01 1.77192e-02 DIIS
@ROHF iter 4: -28149.15025321591747 -2.52923e-01 5.74277e-03 DIIS
@ROHF iter 5: -28149.17492741751266 -2.46742e-02 1.96263e-03 DIIS

Note what happens in iteration 1. I thought this had been fixed since my first calculation with a fresh pull of the master didn’t show such a dramatic swing in energy (which in the case above results in convergence to the wrong state).

uo2_awcvtzdk3_uncon.dat (5.0 KB)
wcvtzdk3_bccd.dat (1.2 KB)
uo2_avdzdk3_uncon.dat (3.3 KB)
uo2_avdzdk3.dat (18.4 KB)
uo2_awcvtzdk3.dat (30.1 KB)

jturney · December 15, 2017, 10:17pm

One issue might be that scf might be using density fitted integrals but then the coupled cluster codes are expecting conventional. If everything is conventional then I’d expect what @dgasmith posted with ref_wfn to perform as one would expect.

Kirk · December 15, 2017, 11:14pm

Since I don’t have a density fitting basis set for U, I’m doing PK.

susilehtola · December 16, 2017, 11:47am

The difference in speed is definitely due to the use of general contractions.

As a workaround, you might consider using Cholesky since there you calculate only a small fraction of the integrals - even if the decomposition threshold is small.

Kirk · December 16, 2017, 5:27pm

Thanks Susi, I guess that would get rid of one of my integral evaluations.

Kirk · December 17, 2017, 2:21am

Actually Cholesky with even a 1e-5 threshold was at nearly a factor of two longer cpu time and still no iterations printed in the SCF…

dgasmith · December 17, 2017, 2:34pm

Hmm, yea Cholesky could be a bit slower as we compute it in a direct fashion. Do you know of an open-source integral library that handles general contractions? We could look into linking to it.

susilehtola · December 17, 2017, 2:53pm

I think libcint supports general contractions; that’s the integrals library used by pyscf.

Kirk · December 17, 2017, 5:25pm

Another option would be to see if Rolind Lind will let you link in Seward - of course that’s in fortran

Kirk · December 17, 2017, 5:27pm

Actually the SCF is an even bigger problem since it can be a deal-breaker. In the course of testing various things, I’ve run the exact same SCF inputs about 3-4 times and only once did it magically converge to the right solution - all starting from the same set of closed-shell orbitals.

susilehtola · December 17, 2017, 5:45pm

That’s true - molcas is open source nowadays, so seward is available.

thomasr92 · December 12, 2020, 10:05am

Sorry for reviving the topic. I am wondering if the current version supports reusing integrals between jobs? If not, is there any plan to implement this?