That course of action sounds good to me.
As best as we can tell, it’s apples to apples. I ran this on the CCQC cluster, which Jet set up to allow threading on Psi. My CFOUR output says
Running on 4 MPI processes
Running with 1 threads/proc
Memory limit is: 6.51926GB
One-particle lists are cached
Two-particle lists are cached
T1 and T2 DIIS vectors are cached
ABCI is not cached
ABCD is done in the AO basis
ABC and ABCD transposes are coarse-threaded
An out-of-core algorithm is used for <Ab|Ci>
DIIS is used to accelerate convergence of T1 and T2
Psi4 output says
AO Basis = NONE
ABCD = NEW
Cache Level = 2
Cache Type = LOW
Number of threads for explicit ijk threading: 4
MKL num_threads set to 1 for explicit threading.
The time falls from 43 to 36 minutes upon boosting Psi’s cache level to 3, but the difference between cache levels 2 and 3 is the ABCI, which CFOUR says it doesn’t cache. So that comparison is better for Psi but is no longer apples to apples.
Let me know if there’s anything else I can check.