Is my T2 truncation scheme for CCSD a good idea?

Kee · August 27, 2019, 2:24am

Hi everyone,

I want to examine how CCSD will perform if I drop some of the double excitation terms, and I was wondering how to do such calculation using psi4. The following are some of my thoughts.

My understanding is that the conventional CCSD will include all single and double excitation terms. But what if my computing resource is limited? Can I reduce the cost of CCSD by dropping some of the insignificant double excitation terms and still can guarantee some chemical accuracy (1kcal/mol)?

My original plan was to first do MP2 calculation and set some energy threshold to screen insignificant terms, so that I have now from full T2 terms to new trimmed T2’ terms (T2’ is a smaller subset of T2). Then, there are two possible approaches:

I somehow feed this T2’ to CCSD algorithm and get an energy E’, and compare E’ with the energy from CCSD with full T2 terms E.
I calculate how much energy does each T2 contributes in the original CCSD, and sum energies caused by trimmed T2’ terms and see how much energy is recovered with trimmed T2’ terms. Maybe this is easier to realize in PSI4 compared to first approach?

jmisiewicz · August 27, 2019, 4:29am

I think you mean to say that you’re dropping amplitudes, not terms. In quantum chemistry jargon, amplitudes are the coefficients of the cluster operator (t^ij_ab) and terms are tensor contractions of amplitudes and integrals that appear in the equations for energy and amplitude stationarity.

I can’t identify your question. Are you asking for ways to reduce the cost of coupled cluster? Are you asking which of (1) or (2) is easier to implement in a prototype?

hokru · August 27, 2019, 8:22am

Not worth it in my opinion. The accuracy of CCSD itself is barely better than MP2 except in maybe few cases. Can even be worse. The (T) in CCSD(T) is the expensive part.

You may wish to look into FNO and the various OSV/PNO approaches.

Kee · August 27, 2019, 12:57pm

Sorry for the confusion. Yes, I want to drop amplitudes. Or, probably equivalently setting some of the t^ij_ab amplitudes always equal to zero.
I mean (1) and (2) which is more realistic to realize. I prefer realizing (1) but I guess that is not easy, so I provide a possible approach (2).

Kee · August 27, 2019, 1:02pm

I just need to keep enough double excitation amplitudes to reach chemical accuracy. I want to use those double excitation terms in some quantum computing algorithm (VQE with UCC ansatz), in which each double excitation term counts. So I want to cut as many double excitation terms as possible as first guess of UCC ansatz.

jmisiewicz · August 27, 2019, 1:22pm

I was wondering if this was related to your UCC/VQE question. Chemical accuracy may be the ultimate goal, but what you really need from CCSD is a good guess to the UCCSD amplitudes.

First, I’d be cautious about assuming that the UCCSD and CCSD amplitudes will be similar enough that a CCSD guess is advantageous. I’d recommend checking that for simple systems before proceeding.

If you decide you still want to go through with this, try the frozen virtual or density-fitting (DF) approximations before trying to make your own.
Frozen-Virtual: Amplitudes indexed by orbitals that are high-energy virtuals are arbitrarily zero. This is risky
DF: This is an approximate factorization of the two-electron integrals. Needing to contract smaller tensors means contractions are less expensive. I know this is discussed in Psi4Numpy.

Kee · August 29, 2019, 8:41pm

Yes, they’re related to UCC/VQE question, but a difference one. You are right that CCSD amplitudes is probably not enough for UCCSD, but it is the best I can do currently for my purpose.

So, I modified the code for CCSD (here) iteration a little bit by adding two lines, which sets some of t2 amplitudes zero whenever it is updated. The list of t2 that I wish to set to zero are selected from MP2 calculation. I selected double excitation terms to be dropped from MP2 correction energy from ~0 energy contribution to up to some threshold.

I was wondering what is the risk here? Can I simply set those selected t2 amplitudes to zero without making whole calculation invalid? After running this code I do get some sensible result that the more double excitation terms I drop, the higher the CCSD energy compared to un-modified one.

Here is the part that I modified (actually I only add two lines to set t2 to zero).

set_t2_zero(t2, zero_list) ### I added this line
for CCSD_iter in range(1, maxiter + 1):
    ### Build intermediates: [Stanton:1991:4334] Eqns. 3-8

    Fae = build_Fae(t1, t2)
    Fmi = build_Fmi(t1, t2)
    Fme = build_Fme(t1, t2)

    Wmnij = build_Wmnij(t1, t2)
    Wabef = build_Wabef(t1, t2)
    Wmbej = build_Wmbej(t1, t2)

    #### Build RHS side of t1 equations, [Stanton:1991:4334] Eqn. 1
    rhs_T1  = F[o, v].copy()
    rhs_T1 += np.einsum('ie,ae->ia', t1, Fae)
    rhs_T1 -= np.einsum('ma,mi->ia', t1, Fmi)
    rhs_T1 += np.einsum('imae,me->ia', t2, Fme)
    rhs_T1 -= np.einsum('nf,naif->ia', t1, MO[o, v, o, v])
    rhs_T1 -= 0.5 * np.einsum('imef,maef->ia', t2, MO[o, v, v, v])
    rhs_T1 -= 0.5 * np.einsum('mnae,nmei->ia', t2, MO[o, o, v, o])

    ### Build RHS side of t2 equations, [Stanton:1991:4334] Eqn. 2
    rhs_T2 = MO[o, o, v, v].copy()

    # P_(ab) t_ijae (F_be - 0.5 t_mb F_me)
    tmp = Fae - 0.5 * np.einsum('mb,me->be', t1, Fme)
    Pab = np.einsum('ijae,be->ijab', t2, tmp)
    rhs_T2 += Pab
    rhs_T2 -= Pab.swapaxes(2, 3)

    # P_(ij) t_imab (F_mj + 0.5 t_je F_me)
    tmp = Fmi + 0.5 * np.einsum('je,me->mj', t1, Fme)
    Pij = np.einsum('imab,mj->ijab', t2, tmp)
    rhs_T2 -= Pij
    rhs_T2 += Pij.swapaxes(0, 1)

    tmp_tau = build_tau(t1, t2)
    rhs_T2 += 0.5 * np.einsum('mnab,mnij->ijab', tmp_tau, Wmnij)
    rhs_T2 += 0.5 * np.einsum('ijef,abef->ijab', tmp_tau, Wabef)

    # P_(ij) * P_(ab)
    # (ij - ji) * (ab - ba)
    # ijab - ijba -jiab + jiba
    tmp = np.einsum('ie,ma,mbej->ijab', t1, t1, MO[o, v, v, o])
    Pijab = np.einsum('imae,mbej->ijab', t2, Wmbej)
    Pijab -= tmp

    rhs_T2 += Pijab
    rhs_T2 -= Pijab.swapaxes(2, 3)
    rhs_T2 -= Pijab.swapaxes(0, 1)
    rhs_T2 += Pijab.swapaxes(0, 1).swapaxes(2, 3)

    Pij = np.einsum('ie,abej->ijab', t1, MO[v, v, v, o])
    rhs_T2 += Pij
    rhs_T2 -= Pij.swapaxes(0, 1)

    Pab = np.einsum('ma,mbij->ijab', t1, MO[o, v, o, o])
    rhs_T2 -= Pab
    rhs_T2 += Pab.swapaxes(2, 3)

    ### Update t1 and t2 amplitudes
    t1 = rhs_T1 / Dia
    t2 = rhs_T2 / Dijab
    set_t2_zero(t2, zero_list) . ### I added this line


    ### Compute CCSD correlation energy
    CCSDcorr_E = np.einsum('ia,ia->', F[o, v], t1)
    CCSDcorr_E += 0.25 * np.einsum('ijab,ijab->', MO[o, o, v, v], t2)
    CCSDcorr_E += 0.5 * np.einsum('ijab,ia,jb->', MO[o, o, v, v], t1, t1)

    ### Print CCSD correlation energy
    # print('CCSD Iteration %3d: CCSD correlation = %3.12f  '\
    #       'dE = %3.5E' % (CCSD_iter, CCSDcorr_E, (CCSDcorr_E - CCSDcorr_E_old)))
    if (abs(CCSDcorr_E - CCSDcorr_E_old) < E_conv):
        break

    CCSDcorr_E_old = CCSDcorr_E

jmisiewicz · August 30, 2019, 12:33am

I’m not sure what the energy error would be. This truncation approach offers no benefits in a classical setting, so I’m not aware of it being tried before. My first intuition is that this approach will break down for more “difficult” cases. It’s possible to have rhs_T2 / Dijab very large for the amplitudes you decide to zero.

Going beyond the energy, good luck getting any analytic gradient or property computations. Your condition for truncating amplitudes is non-differentiable.

hokru · August 30, 2019, 6:16am

Perhaps you also might need to worry about losing size-consistency.