Sorry for the length but I’ve tried to include as much info as I can!
I’ve been testing various builds of psi4 on several different computers, and I’m currently trying to track down why all tests run fine on all computers (given a high enough MAX_AM_ERI for the pywrap tests to run!) apart from one where a single test fails: dftd3-version fails because energies do not match those expected.
Initially, I thought I had an older version of dftd3 on the failing one but that’s not the issue here: the dispersion parts all match perfectly! The part(s) that fails is where dftd3 is called from the C-side, i.e. functionals such as “b3lyp-d3” are tested. The dispersion energies match the reference values but it fails when the total energy is compared because the energy from the DFT part is different.
Digging into this, I’ve taken the ethene dimer structure and run a set of simple:
“ethene dimer structure = eeee”
set basis 6-31G(d)
energy(‘b3lyp’, molecule=eeee)
test calcs on 3 different machines, A, B, and C (where dftd3-version fails on C). The total energies match on machines A and B to 9 dp, which seems pretty good to me, but only to 4 dp on machine C!
dftd3-version uses a fairly cut-down integration grid. If I use this cut down number of spherical and radial points in my test calcs, the total energies still match on A and B to 9 dp, but now C only matches to 3 dp!
One thing that I’ve spotted is that A and B use the same number of grid points: Total Points = 265160 (64564 for the smaller grid); Max Points = 4197 (3539 for the smaller grid) -> the same energy.
On machine C (the “odd one”), Total Points = 265208 (64592 for the smaller grid); Max Points = 4784 (3709 for the smaller grid).
For some reason, Machine C is using a different grid, and I suspect this could be why my energies don’t match!
The same is true in the dftd3-version output:
Total Points = 64396; Max Points = 3627 (Total Energy matches and test passed);
Total Points = 64400; Max Points = 3758 (Total Energy doesn’t match: test fails);
The supplied output.ref has:
Total Points = 64396; Max Points = 3525
Hence, for some reason, the failing machine uses too many integration grid points and gets the wrong energy.
I had a brief look at where the grid pruning happens and there didn’t seem to be anything too complicated going on, so I’m at a bit of a loss as to why this happens, seemingly on one machine! Unless this is some odd difference in rounding behaviour between the different machines that I’ve managed to discover.
All versions are using python 3.5.3 and are linked against MKL (although I doubt that’s involved here). On machines A and C, gcc 6.3.0 was used; on machine B, gcc 4.9.4 was used.
HF energies match to 9 dp on all three machines.
Interestingly, the dft-alone tests pass OK but I will look back at their outputs to check them a bit more closely.
Does anyone have any clues on where to look next?