It seems the number of threads is changing the long-range exchange component of the energy for range-separated hybrids an unusual amount… giving me errors upwards of 10^-6 Hartree, which is actually important for what I’m doing. All other components seem to be fine…
My input file:
molecule {
0 1
C 0.000000000000 -1.391500000000 -1.316799235698
C 1.205074349366 -0.695750000000 -1.316799235698
C 1.205074349366 0.695750000000 -1.316799235698
C -0.000000000000 1.391500000000 -1.316799235698
C -1.205074349366 0.695750000000 -1.316799235698
C -1.205074349366 -0.695750000000 -1.316799235698
H 0.000000000000 -2.471500000000 -1.316799235698
H 2.140381785453 -1.235750000000 -1.316799235698
H 2.140381785453 1.235750000000 -1.316799235698
H -0.000000000000 2.471500000000 -1.316799235698
H -2.140381785453 1.235750000000 -1.316799235698
H -2.140381785453 -1.235750000000 -1.316799235698
F 0.000000000000 -0.000000000000 5.183200764302
H 0.000000000000 -0.000000000000 4.266242764302
symmetry c1
}
set {
basis aug-cc-pvtz
}
energy('wb97x-v')
The version information in particular is critical. If you haven’t read it already, read this topic to see what we expect of a bug report. It helps us find problems faster.
1 thread Total energy: -332.7077928354452183
4 thread Total energy: -332.7077887193119636
I also added print statements to libscf_solver/rhf.cc lines 320 and 325 to print the exchange energy before and after the range separated part. The error is ~10^-9 in the first part and ~10^-6 in the second
Can you try again, but with the e_convergence set tighter? 1e-8 at least, but you should be able to converge even tighter. Psi can only check that subsequent iterations are within your energy threshold of each other, not that you’re within 1e-8 of the true answer. In some cases, that can lead to answers that are both “converged” to within the same threshold but differ by a small factor larger than the threshold. That may be all you’re saying.
1 thread:
Total Energy = -332.7077928426228368
Exchange_E_K1 = -7.2880153604952360E+00
Exchange_E_K2 = -1.4071424959624862E+01
4 threads:
Total Energy = -332.7077887264917990
Exchange_E_K1 = -7.2880153552483451E+00
Exchange_E_K2 = -1.4071420827707852E+01
Still an error of ~10e^-6.
I don’t think it can be fixed by convergence. I actually first noticed this problem using read-in orbitals, where I read in the same orbitals for 1 thread and 4 threads.
Using 1.4a2 please verify you using the mem/disk algorithms by looking at the output and provide the values of both algorithms on 1/4 cores. This is sounding like something that does not originate from the JK part of the code.
I printed out the J[0][0], K[0][0], and wK[0][0] elements with SCF Algorithm Type MEM_DF.
1 thread:
J 10.48507535000398
K 3.614759977196237
wK 0.336487905891259
4 threads:
J 10.48507537324325
K 3.6147599993133683
wK 0.3364877004447264
Not sure if this means anything… but wK has the highest difference between threads despite being the smallest component. I’m not even sure how threading would be affecting individual elements. (EDIT: that’s probably just from the difference in the converged SCF density matrix)
I mean looking at the JK output file headers to ensure Mem/Disk is actually being used. The actual matrix elements will vary by a bit and are not too useful to compare. I’m not sure how to debug this well, does this happen for a non VV10 functional and a GGA?
Yes. From the functionals I’ve tested, it happens for wB97X-v, wB97M-v, wB97X, and M11 (all range-separated hybrids). It does not happen for B97M-V (nonhybrid) or B3LYP (global hybrid).
To rule some things out, after a computation can you take the density matrix and do a Psi4NumPy style computation the wK energy and compare it against the JK engines? Also, please move this to a Psi4 issue.