Long range exchange energy changes with number of threads?

It seems the number of threads is changing the long-range exchange component of the energy for range-separated hybrids an unusual amount… giving me errors upwards of 10^-6 Hartree, which is actually important for what I’m doing. All other components seem to be fine…

My input file:

molecule {
0 1
C     0.000000000000    -1.391500000000    -1.316799235698
C     1.205074349366    -0.695750000000    -1.316799235698
C     1.205074349366     0.695750000000    -1.316799235698
C    -0.000000000000     1.391500000000    -1.316799235698
C    -1.205074349366     0.695750000000    -1.316799235698
C    -1.205074349366    -0.695750000000    -1.316799235698
H     0.000000000000    -2.471500000000    -1.316799235698
H     2.140381785453    -1.235750000000    -1.316799235698
H     2.140381785453     1.235750000000    -1.316799235698
H    -0.000000000000     2.471500000000    -1.316799235698
H    -2.140381785453     1.235750000000    -1.316799235698
H    -2.140381785453    -1.235750000000    -1.316799235698
F     0.000000000000    -0.000000000000     5.183200764302
H     0.000000000000    -0.000000000000     4.266242764302

symmetry c1

set {
basis                   aug-cc-pvtz


Could give further information?
Which psi4 version? What are your output numbers?

The version information in particular is critical. If you haven’t read it already, read this topic to see what we expect of a bug report. It helps us find problems faster.

Sorry. The version is 1.3.

1 thread Total energy: -332.7077928354452183
4 thread Total energy: -332.7077887193119636

I also added print statements to libscf_solver/rhf.cc lines 320 and 325 to print the exchange energy before and after the range separated part. The error is ~10^-9 in the first part and ~10^-6 in the second

1 thread:
Exchange_E_K1 = -7.2880184750694870E+00
Exchange_E_K2 = -1.4071427829678669E+01

4 threads:
Exchange_E_K1 = -7.2880184697916350E+00
Exchange_E_K2 = -1.4071423697698870E+01

Can you try again, but with the e_convergence set tighter? 1e-8 at least, but you should be able to converge even tighter. Psi can only check that subsequent iterations are within your energy threshold of each other, not that you’re within 1e-8 of the true answer. In some cases, that can lead to answers that are both “converged” to within the same threshold but differ by a small factor larger than the threshold. That may be all you’re saying.

I set convergence to 1e-10. Energies now:

1 thread:
Total Energy = -332.7077928426228368
Exchange_E_K1 = -7.2880153604952360E+00
Exchange_E_K2 = -1.4071424959624862E+01

4 threads:
Total Energy = -332.7077887264917990
Exchange_E_K1 = -7.2880153552483451E+00
Exchange_E_K2 = -1.4071420827707852E+01

Still an error of ~10e^-6.

I don’t think it can be fixed by convergence. I actually first noticed this problem using read-in orbitals, where I read in the same orbitals for 1 thread and 4 threads.

  1. Do you see this in 1.4a2 conda install psi4 -c psi4/label/dev
  2. Which JK algorithm is this using?

Yes. Happens with 1.4a2 as well. By JK algorithm you mean density fitting vs. pk? I’m using density fitting.

MemJK vs DiskJK, should be in your JK info print.

I was using Disk, but now I tried with both mem_jk and disk_jk and it happens for both.

Although I’m a bit concerned about this bit: my output says:

SCF Algorithm Type is DISK_DF.

but also:
Algorithm: Core
Integral Cache: NONE
Schwarz Cutoff: 1E-12
Fitting Condition: 1E-10

What is concerning there?

Using 1.4a2 please verify you using the mem/disk algorithms by looking at the output and provide the values of both algorithms on 1/4 cores. This is sounding like something that does not originate from the JK part of the code.

I was under the impression that “Algorithm: Disk” should correspond to “SCF Algorithm Type is DISK_DF.”

Anyways, for scf_type = disk_df my total energies are:
1 thread: -332.7077955299047858
4 threads: -332.7077914458822079
for scf_type = mem_df:
1 thread: -332.7077955343176541
4 threads: -332.7077914455318250

Disk and Memory DF algorithms have both in-core and on-disk counterparts. Perhaps not the best names, but they are optimized for different regimes.

Can you double check in the JK print out you are getting MemDF/DiskDF JK objects? The values look too close to be a JK issue IMO.

I’m not quite sure what you mean by JK print out.

I printed out the J[0][0], K[0][0], and wK[0][0] elements with SCF Algorithm Type MEM_DF.

1 thread:
J 10.48507535000398
K 3.614759977196237
wK 0.336487905891259
4 threads:
J 10.48507537324325
K 3.6147599993133683
wK 0.3364877004447264

Not sure if this means anything… but wK has the highest difference between threads despite being the smallest component. I’m not even sure how threading would be affecting individual elements. (EDIT: that’s probably just from the difference in the converged SCF density matrix)

I mean looking at the JK output file headers to ensure Mem/Disk is actually being used. The actual matrix elements will vary by a bit and are not too useful to compare. I’m not sure how to debug this well, does this happen for a non VV10 functional and a GGA?

Only range-separated hybrids, and it’s only the long range exchange component that has real issues with this.

So wB97X is effected as well? This is almost too stable between JK algorithms to be a real issue there.

Yes. From the functionals I’ve tested, it happens for wB97X-v, wB97M-v, wB97X, and M11 (all range-separated hybrids). It does not happen for B97M-V (nonhybrid) or B3LYP (global hybrid).

To rule some things out, after a computation can you take the density matrix and do a Psi4NumPy style computation the wK energy and compare it against the JK engines? Also, please move this to a Psi4 issue.