Long range exchange energy changes with number of threads?

bge6 · March 18, 2020, 12:17pm

It seems the number of threads is changing the long-range exchange component of the energy for range-separated hybrids an unusual amount… giving me errors upwards of 10^-6 Hartree, which is actually important for what I’m doing. All other components seem to be fine…

My input file:


molecule {
0 1
C     0.000000000000    -1.391500000000    -1.316799235698
C     1.205074349366    -0.695750000000    -1.316799235698
C     1.205074349366     0.695750000000    -1.316799235698
C    -0.000000000000     1.391500000000    -1.316799235698
C    -1.205074349366     0.695750000000    -1.316799235698
C    -1.205074349366    -0.695750000000    -1.316799235698
H     0.000000000000    -2.471500000000    -1.316799235698
H     2.140381785453    -1.235750000000    -1.316799235698
H     2.140381785453     1.235750000000    -1.316799235698
H    -0.000000000000     2.471500000000    -1.316799235698
H    -2.140381785453     1.235750000000    -1.316799235698
H    -2.140381785453    -1.235750000000    -1.316799235698
F     0.000000000000    -0.000000000000     5.183200764302
H     0.000000000000    -0.000000000000     4.266242764302

symmetry c1
}

set {
basis                   aug-cc-pvtz
}


energy('wb97x-v')

hokru · March 18, 2020, 1:11pm

Could give further information?
Which psi4 version? What are your output numbers?

jmisiewicz · March 18, 2020, 1:21pm

The version information in particular is critical. If you haven’t read it already, read this topic to see what we expect of a bug report. It helps us find problems faster.

bge6 · March 18, 2020, 1:39pm

Sorry. The version is 1.3.

1 thread Total energy: -332.7077928354452183
4 thread Total energy: -332.7077887193119636

I also added print statements to libscf_solver/rhf.cc lines 320 and 325 to print the exchange energy before and after the range separated part. The error is ~10^-9 in the first part and ~10^-6 in the second

1 thread:
Exchange_E_K1 = -7.2880184750694870E+00
Exchange_E_K2 = -1.4071427829678669E+01

4 threads:
Exchange_E_K1 = -7.2880184697916350E+00
Exchange_E_K2 = -1.4071423697698870E+01

jmisiewicz · March 18, 2020, 3:16pm

Can you try again, but with the e_convergence set tighter? 1e-8 at least, but you should be able to converge even tighter. Psi can only check that subsequent iterations are within your energy threshold of each other, not that you’re within 1e-8 of the true answer. In some cases, that can lead to answers that are both “converged” to within the same threshold but differ by a small factor larger than the threshold. That may be all you’re saying.

bge6 · March 18, 2020, 4:04pm

I set convergence to 1e-10. Energies now:

1 thread:
Total Energy = -332.7077928426228368
Exchange_E_K1 = -7.2880153604952360E+00
Exchange_E_K2 = -1.4071424959624862E+01

4 threads:
Total Energy = -332.7077887264917990
Exchange_E_K1 = -7.2880153552483451E+00
Exchange_E_K2 = -1.4071420827707852E+01

Still an error of ~10e^-6.

I don’t think it can be fixed by convergence. I actually first noticed this problem using read-in orbitals, where I read in the same orbitals for 1 thread and 4 threads.

dgasmith · March 18, 2020, 10:03pm

Do you see this in 1.4a2 conda install psi4 -c psi4/label/dev
Which JK algorithm is this using?

bge6 · March 18, 2020, 11:28pm

Yes. Happens with 1.4a2 as well. By JK algorithm you mean density fitting vs. pk? I’m using density fitting.

dgasmith · March 18, 2020, 11:40pm

MemJK vs DiskJK, should be in your JK info print.

bge6 · March 19, 2020, 2:01am

I was using Disk, but now I tried with both mem_jk and disk_jk and it happens for both.

Although I’m a bit concerned about this bit: my output says:

SCF Algorithm Type is DISK_DF.

but also:
Algorithm: Core
Integral Cache: NONE
Schwarz Cutoff: 1E-12
Fitting Condition: 1E-10

dgasmith · March 19, 2020, 1:03pm

What is concerning there?

Using 1.4a2 please verify you using the mem/disk algorithms by looking at the output and provide the values of both algorithms on 1/4 cores. This is sounding like something that does not originate from the JK part of the code.

bge6 · March 19, 2020, 1:11pm

I was under the impression that “Algorithm: Disk” should correspond to “SCF Algorithm Type is DISK_DF.”

Anyways, for scf_type = disk_df my total energies are:
1 thread: -332.7077955299047858
4 threads: -332.7077914458822079
for scf_type = mem_df:
1 thread: -332.7077955343176541
4 threads: -332.7077914455318250

dgasmith · March 19, 2020, 1:29pm

Disk and Memory DF algorithms have both in-core and on-disk counterparts. Perhaps not the best names, but they are optimized for different regimes.

Can you double check in the JK print out you are getting MemDF/DiskDF JK objects? The values look too close to be a JK issue IMO.

bge6 · March 19, 2020, 5:50pm

I’m not quite sure what you mean by JK print out.

I printed out the J[0][0], K[0][0], and wK[0][0] elements with SCF Algorithm Type MEM_DF.

1 thread:
J 10.48507535000398
K 3.614759977196237
wK 0.336487905891259
4 threads:
J 10.48507537324325
K 3.6147599993133683
wK 0.3364877004447264

Not sure if this means anything… but wK has the highest difference between threads despite being the smallest component. I’m not even sure how threading would be affecting individual elements. (EDIT: that’s probably just from the difference in the converged SCF density matrix)

dgasmith · March 20, 2020, 1:00am

I mean looking at the JK output file headers to ensure Mem/Disk is actually being used. The actual matrix elements will vary by a bit and are not too useful to compare. I’m not sure how to debug this well, does this happen for a non VV10 functional and a GGA?

bge6 · March 20, 2020, 1:20am

Only range-separated hybrids, and it’s only the long range exchange component that has real issues with this.

dgasmith · March 20, 2020, 12:41pm

So wB97X is effected as well? This is almost too stable between JK algorithms to be a real issue there.

bge6 · March 20, 2020, 1:22pm

Yes. From the functionals I’ve tested, it happens for wB97X-v, wB97M-v, wB97X, and M11 (all range-separated hybrids). It does not happen for B97M-V (nonhybrid) or B3LYP (global hybrid).

dgasmith · March 20, 2020, 4:18pm

To rule some things out, after a computation can you take the density matrix and do a Psi4NumPy style computation the wK energy and compare it against the JK engines? Also, please move this to a Psi4 issue.