It seems the number of threads is changing the long-range exchange component of the energy for range-separated hybrids an unusual amount… giving me errors upwards of 10^-6 Hartree, which is actually important for what I’m doing. All other components seem to be fine…
My input file:
C 0.000000000000 -1.391500000000 -1.316799235698
C 1.205074349366 -0.695750000000 -1.316799235698
C 1.205074349366 0.695750000000 -1.316799235698
C -0.000000000000 1.391500000000 -1.316799235698
C -1.205074349366 0.695750000000 -1.316799235698
C -1.205074349366 -0.695750000000 -1.316799235698
H 0.000000000000 -2.471500000000 -1.316799235698
H 2.140381785453 -1.235750000000 -1.316799235698
H 2.140381785453 1.235750000000 -1.316799235698
H -0.000000000000 2.471500000000 -1.316799235698
H -2.140381785453 1.235750000000 -1.316799235698
H -2.140381785453 -1.235750000000 -1.316799235698
F 0.000000000000 -0.000000000000 5.183200764302
H 0.000000000000 -0.000000000000 4.266242764302
1 thread Total energy: -332.7077928354452183
4 thread Total energy: -332.7077887193119636
I also added print statements to libscf_solver/rhf.cc lines 320 and 325 to print the exchange energy before and after the range separated part. The error is ~10^-9 in the first part and ~10^-6 in the second
Can you try again, but with the e_convergence set tighter? 1e-8 at least, but you should be able to converge even tighter. Psi can only check that subsequent iterations are within your energy threshold of each other, not that you’re within 1e-8 of the true answer. In some cases, that can lead to answers that are both “converged” to within the same threshold but differ by a small factor larger than the threshold. That may be all you’re saying.
Using 1.4a2 please verify you using the mem/disk algorithms by looking at the output and provide the values of both algorithms on 1/4 cores. This is sounding like something that does not originate from the JK part of the code.
I printed out the J, K, and wK elements with SCF Algorithm Type MEM_DF.
Not sure if this means anything… but wK has the highest difference between threads despite being the smallest component. I’m not even sure how threading would be affecting individual elements. (EDIT: that’s probably just from the difference in the converged SCF density matrix)
I mean looking at the JK output file headers to ensure Mem/Disk is actually being used. The actual matrix elements will vary by a bit and are not too useful to compare. I’m not sure how to debug this well, does this happen for a non VV10 functional and a GGA?
To rule some things out, after a computation can you take the density matrix and do a Psi4NumPy style computation the wK energy and compare it against the JK engines? Also, please move this to a Psi4 issue.