Reuse integrals between jobs

So here’s a silly question. I’m toying with a set of calculations, where I’m trying to find all bound solutions by varying the orbital occupations in the symmetry groups. This gives a ridiculously large number of calculations, and so I’d like to reuse the integrals between the jobs. Is there a nice way to do this? I’d prefer either conventional integrals or Cholesky integrals.

Looks like we don’t have this feature as of now. If you look at void MintsHelper::integrals() function in mintshelper.cc inside libmints which gets called in hf.cc, there are no checks to see if the integrals exist already on the disk. I guess we will have to add some sort of check to read them from disk if their state doesn’t change in any subsequent calculations with a given process-id.

Hmm, in the python layer we may be able to check for it easier. If you look in procrouting/proc.py and examine the run_cc* functions you can find a line which checks if the integrals are computed by examining the scf_type. We can probably check to see if the file exist before doing the computation. Susi you can probably do this now by running Psi under the messy flag and altering that function in a local version.

Are there any plans to make this type of checking the default? I currently am running a job where the integrals take forever (about 5-6 hours over 4 threads). The integrals seem to be re-calculated for the SCF and also in the subsequent integral transformation before the ccsd(t) step. Contributing to this is probably a mistake on my part by putting an extra scf in my input via:
energy(‘scf’)
energy(‘ccsd(t)’)
energy(‘bccd(t)’)

But in order to get my scf to correctly converge, I first did a calculation on the charged, closed shell molecule and then followed this with the neutral radical (using guess read at the same geometry). This also “cost” me an extra set of integral evaluations. After the CCSD(T) step, the program is now repeating the scf (and integrals) for the bccd(t) job (and presumably integrals yet again for the integral transformation)

Side note: are these integrals so slow because I’m using a large generally contracted basis set? The same calculation in Molpro only takes just over 2 minutes to calculate the integrals (3 cores using MPI). I’m not slamming the performance of the code, just trying to understand the bottleneck.

Well, that sounds bad. I think anything over a factor of two is saying that we are doing something very wrong, much less a factor of 150. Would be able to post the input? We have LibInt, SimInt, and LibERD as possible integral libraries. I dont think any of them take advantage of general contractions. @jturney @andysim might be able to say more.

You can skip the SCF by doing the following:

energy(‘scf’)
scf_e, scf_wfn = energy(‘scf’, return_wfn=True)
energy(‘ccsd(t)’, ref_wfn=scf_wfn)
energy(‘bccd(t)’, ref_wfn=scf_wfn)

Not:e we do SCF twice which is partly my fault as we cannot quite yet start a new SCF computation from a previous one due to the crazy guess mechanics. This is certainly on the TODO list.

That’s why I was wondering if perhaps I’ve done something stupid in my build, although I’m using gcc 6.2 and it definitely looks like it picked up my MKL libraries. I’ve posted the input and basis sets used (I was forced though to change the .gbs basis set extension to .dat). Since you mentioned the SCF guess stuff, I’ve had two SCFs using identical inputs result in different solutions when using guess read. The “bad” one shows:

@ROHF iter 0: -28148.52605763907923 -2.81485e+04 0.00000e+00
@ROHF iter 1: -28145.45201529635597 3.07404e+00 4.47619e-02
@ROHF iter 2: -28148.79220221935248 -3.34019e+00 1.91696e-02 DIIS
@ROHF iter 3: -28148.89733059309219 -1.05128e-01 1.77192e-02 DIIS
@ROHF iter 4: -28149.15025321591747 -2.52923e-01 5.74277e-03 DIIS
@ROHF iter 5: -28149.17492741751266 -2.46742e-02 1.96263e-03 DIIS

Note what happens in iteration 1. I thought this had been fixed since my first calculation with a fresh pull of the master didn’t show such a dramatic swing in energy (which in the case above results in convergence to the wrong state).

uo2_awcvtzdk3_uncon.dat (5.0 KB)
wcvtzdk3_bccd.dat (1.2 KB)
uo2_avdzdk3_uncon.dat (3.3 KB)
uo2_avdzdk3.dat (18.4 KB)
uo2_awcvtzdk3.dat (30.1 KB)

One issue might be that scf might be using density fitted integrals but then the coupled cluster codes are expecting conventional. If everything is conventional then I’d expect what @dgasmith posted with ref_wfn to perform as one would expect.

Since I don’t have a density fitting basis set for U, I’m doing PK.

The difference in speed is definitely due to the use of general contractions.

As a workaround, you might consider using Cholesky since there you calculate only a small fraction of the integrals - even if the decomposition threshold is small.

Thanks Susi, I guess that would get rid of one of my integral evaluations.

Actually Cholesky with even a 1e-5 threshold was at nearly a factor of two longer cpu time and still no iterations printed in the SCF…

Hmm, yea Cholesky could be a bit slower as we compute it in a direct fashion. Do you know of an open-source integral library that handles general contractions? We could look into linking to it.

I think libcint supports general contractions; that’s the integrals library used by pyscf.

Another option would be to see if Rolind Lind will let you link in Seward - of course that’s in fortran :slight_smile:

Actually the SCF is an even bigger problem since it can be a deal-breaker. In the course of testing various things, I’ve run the exact same SCF inputs about 3-4 times and only once did it magically converge to the right solution - all starting from the same set of closed-shell orbitals.

That’s true - molcas is open source nowadays, so seward is available.

Sorry for reviving the topic. I am wondering if the current version supports reusing integrals between jobs? If not, is there any plan to implement this?