SCF Convergence issue

bgobbi · August 10, 2017, 8:06pm

We are seeing problems with SCF convergence. Below is our input:

The job fails in the second cycle of the geometry optimization. With the RMS not decreasing after the 5th cycle.
I would have expected the guess to be reasonably good given that it is the second cycle of a a geometry optimization.
Could you also what the RMS refers to?

Thanks,
Alberto

SCF Guess: Orbitals guess was supplied from a previous computation.

==> Iterations <==

                       Total Energy        Delta E     RMS |[F,P]|

@DF-RKS iter 0: -13045.90536692229580 -1.30459e+04 3.85584e+01
@DF-RKS iter 1: 129882.97040010261117 1.42929e+05 6.27055e+01
@DF-RKS iter 2: 248919.32267033698736 1.19036e+05 1.28728e+02 DIIS
@DF-RKS iter 3: 157982.09611620506621 -9.09372e+04 6.74676e+01 DIIS
@DF-RKS iter 4: -1827.36715855029388 -1.59809e+05 7.71873e+01 DIIS
@DF-RKS iter 5: -77099.60978518263437 -7.52722e+04 7.49161e+00 DIIS
@DF-RKS iter 6: 94296.27326243813150 1.71396e+05 8.64587e+01 DIIS
@DF-RKS iter 7: 93922.84112911540433 -3.73432e+02 8.64498e+01 DIIS
@DF-RKS iter 8: 94657.99319289218693 7.35152e+02 8.64445e+01 DIIS
@DF-RKS iter 9: 94754.95497342936869 9.69618e+01 8.64630e+01 DIIS
@DF-RKS iter 10: 94734.97976389329415 -1.99752e+01 8.64920e+01 DIIS
@DF-RKS iter 11: 94318.07436000506277 -4.16905e+02 8.64808e+01 DIIS
@DF-RKS iter 12: 92988.96451654926932 -1.32911e+03 8.64337e+01 DIIS
@DF-RKS iter 13: 93538.08612074318808 5.49122e+02 8.64579e+01 DIIS
@DF-RKS iter 14: 93216.11399623805482 -3.21972e+02 8.64310e+01 DIIS
@DF-RKS iter 15: 93251.44370108740986 3.53297e+01 8.64036e+01 DIIS
@DF-RKS iter 16: 93235.18532787107688 -1.62584e+01 8.63755e+01 DIIS
@DF-RKS iter 17: 93052.87001873142435 -1.82315e+02 8.63830e+01 DIIS
…
@DF-RKS iter 88: 92497.03996290548821 -5.77013e-07 8.64194e+01 DIIS
@DF-RKS iter 89: 92497.03996170758910 -1.19790e-06 8.64194e+01 DIIS
@DF-RKS iter 90: 92497.03996050723072 -1.20036e-06 8.64194e+01 DIIS
@DF-RKS iter 91: 92497.03996213289793 1.62567e-06 8.64194e+01 DIIS
@DF-RKS iter 92: 92497.03996453451691 2.40162e-06 8.64194e+01 DIIS
@DF-RKS iter 93: 92497.03996091890440 -3.61561e-06 8.64194e+01 DIIS
@DF-RKS iter 94: 92497.03996165953868 7.40634e-07 8.64194e+01 DIIS
@DF-RKS iter 95: 92497.03996257559629 9.16058e-07 8.64194e+01 DIIS
@DF-RKS iter 96: 92497.03996206201555 -5.13581e-07 8.64194e+01 DIIS
@DF-RKS iter 97: 92497.03996133855253 -7.23463e-07 8.64194e+01 DIIS
@DF-RKS iter 98: 92497.03996161745454 2.78902e-07 8.64194e+01 DIIS
@DF-RKS iter 99: 92497.03996341893799 1.80148e-06 8.64194e+01 DIIS
@DF-RKS iter 100: 92497.03996035839373 -3.06054e-06 8.64194e+01 DIIS

optMethod = ‘M06-2X’
optBasis = ‘6-31g*’
finalMethod = ‘mp2’
finalBasis = ‘aug-cc-pv[dt]z’

import time
t1 = time.time()
def saveFile(name, string):
f=open(name,‘w’)
f.write(string)
f.close()

molecule rec1_pid15624 {
0 1
C -4.165000 -7.248000 -23.474001
C -1.554000 -7.924000 -24.099001
C -2.084000 -11.902000 -24.750999
C 0.298000 -12.262000 -24.895000
C -3.523000 -12.049000 -25.172001
C -1.610000 -6.023000 -20.294001
C -3.601000 -4.617000 -20.027000
N 0.229000 -10.282000 -20.840000
O 0.012000 -9.303000 -20.004000
C -0.788000 -8.390000 -20.474001
C -1.165000 -8.793000 -21.792000
C -0.457000 -10.064000 -21.955999
C -0.588000 -10.906000 -23.152000
C -2.043000 -8.143000 -22.813000
N -1.276000 -7.284000 -19.733000
C -3.356000 -7.825000 -22.506001
C -3.672000 -7.009000 -24.750000
C -2.358000 -7.343000 -25.073999
C -1.871000 -11.125000 -23.632000
C -0.992000 -12.469000 -25.393999
C 0.500000 -11.483000 -23.768000
O 1.755000 -11.321000 -23.219999
O -1.132000 -13.175000 -26.555000
O -4.493000 -6.441000 -25.698000
C -4.084000 -6.443000 -27.018999
C -2.317000 -5.067000 -19.391001
O -1.132000 -5.684000 -21.379000
C -2.421000 -3.705000 -19.997999
C -6.225000 -12.085000 -25.705000
N -5.483000 -11.018000 -26.054001
C -4.162000 -10.956000 -25.757000
C -5.657000 -13.191000 -25.009001
C -4.316000 -13.155000 -24.698999
C -3.740000 -14.247000 -23.993000
C -4.532000 -15.347000 -23.635000
C -5.895000 -15.384000 -23.957001
C -6.469000 -14.316000 -24.636000
H -5.194300 -6.989300 -23.234100
H -0.523800 -8.171100 -24.352400
H 1.149100 -12.714500 -25.398300
H -3.903700 -5.024900 -20.985701
H -4.436000 -4.390200 -19.372400
H -1.236900 -7.357400 -18.722300
H -3.763800 -8.018400 -21.515900
H -1.918300 -7.168600 -26.050600
H -2.733500 -10.726800 -23.098301
H 1.676700 -10.945700 -22.316999
H -2.083800 -13.262400 -26.765301
H -4.913200 -6.079800 -27.635401
H -3.245900 -5.757900 -27.187500
H -3.842500 -7.449300 -27.380301
H -2.270400 -5.192100 -18.320200
H -1.935900 -3.468800 -20.938900
H -2.452900 -2.856100 -19.321699
H -7.276000 -12.058000 -25.976801
H -3.640000 -10.047900 -26.037701
H -2.684000 -14.269600 -23.732700
H -4.081900 -16.183599 -23.101700
H -6.499200 -16.243401 -23.672701
H -7.528500 -14.337000 -24.884899
no_com
no_reorient
}

memory 8 Gb
set basis $optBasis
set scf guess read
set opt_coordinates cartesian
set optking dynamic_level 1
set GEOM_MAXITER 200

optE, optWfn = optimize(optMethod, return_wfn=True)
t2 = time.time()

set basis $finalBasis
finalE = energy(cbs, scf_wfn=‘scf’, corl_wfn=‘mp2’, corl_basis=finalBasis, crol_schemme=corl_xtpl_helgaker_2)

t3 = time.time()
saveFile(‘psi4.wall.time.log’, ‘OptTime = %d\nSPTime = %d\nTotalTime = %d\n’ %(t2-t1,t3-t2,t3-t1))
saveFile(‘psi4.optimized.geometry.log’, optWfn.molecule().save_string_xyz_file())
saveFile(‘psi4.optimized.energy.log’, str(optE))
saveFile(‘psi4.opt.energy.type.log’, optMethod+’/’+optBasis)
saveFile(‘psi4.final.energy.log’, str(finalE))
saveFile(‘psi4.final.energy.type.log’, finalMethod+’/’+finalBasis)

loriab · August 11, 2017, 2:08pm

Unless you really need M06-2X, you might try another functional for opt, as that one has some problems with numerics and geometry.

On RMS,

github.com

psi4/psi4/blob/master/psi4/src/psi4/libscf_solver/rhf.cc#L228-L230


diis_manager_->set_error_vector_size(1, DIISEntry::Matrix, gradient.get());
diis_manager_->set_vector_size(1, DIISEntry::Matrix, Fa_.get());
initialized_diis_manager_ = true;

bgobbi · August 12, 2017, 7:49pm

I am seeing similar behavior now with:
energy(cbs, scf_wfn=‘scf’, corl_wfn=‘mp2’, corl_basis=‘aug-cc-pv[dt]z’, crol_schemme=corl_xtpl_helgaker_2)

Thanks for taking a look!
Alberto

@DF-RHF iter 0: -1311.14487971766584 -1.31114e+03 4.55515e+01
@DF-RHF iter 1: 61276518768.07105255126953 6.12765e+10 4.41633e+06
@DF-RHF iter 2: 46741594121.24560546875000 -1.45349e+10 2.84554e+06 DIIS
@DF-RHF iter 3: 57236146804.43621826171875 1.04946e+10 1.02072e+07 DIIS
@DF-RHF iter 4: 61382262769.52745056152344 4.14612e+09 4.63318e+06 DIIS
@DF-RHF iter 5: 40299209853.81970977783203 -2.10831e+10 1.75566e+07 DIIS
@DF-RHF iter 6: 31823460963.11372375488281 -8.47575e+09 1.79183e+07 DIIS
@DF-RHF iter 7: 32015346072.99341964721680 1.91885e+08 1.79330e+07 DIIS
@DF-RHF iter 8: 41540034910.95777130126953 9.52469e+09 1.12356e+07 DIIS

@DF-RHF iter 96: 61416996606.13357543945312 5.26652e+08 4.54655e+06 DIIS
@DF-RHF iter 97: 61023824629.93611145019531 -3.93172e+08 5.44124e+06 DIIS
@DF-RHF iter 98: 61405551853.07665252685547 3.81727e+08 4.57535e+06 DIIS
@DF-RHF iter 99: 61072733313.20195007324219 -3.32819e+08 5.33909e+06 DIIS
@DF-RHF iter 100: 61395400136.31912994384766 3.22667e+08 4.60074e+06 DIIS

loriab · August 14, 2017, 7:26pm

I’m running an opt with ‘B3LYP-D2’, and it’s decently well behaved so far (see below). Want to give that a try, so we can narrow down whether this is a method, software, or hardware issue?

optMethod = 'B3LYP-D2'
optBasis = '6-31g*'

  --------------------------------------------------------------------------------------------- ~
   Step     Total Energy     Delta E     MAX Force     RMS Force      MAX Disp      RMS Disp    ~
  --------------------------------------------------------------------------------------------- ~
    Convergence Criteria    1.00e-06 *    3.00e-04 *             o    1.20e-03 *             o  ~
  --------------------------------------------------------------------------------------------- ~
      1   -1659.27929445   -1.66e+03      5.45e-02      8.26e-03 o    1.11e-01      2.71e-02 o  ~
      2   -1659.31721489   -3.79e-02      3.24e-02      3.94e-03 o    1.23e-01      2.72e-02 o  ~
      3   -1659.33158878   -1.44e-02      1.22e-02      1.30e-03 o    3.71e-01      5.50e-02 o  ~
      4   -1659.30798863    2.36e-02      6.35e-02      1.25e-02 o    3.00e-01      5.26e-02 o  ~
	Raising dynamic level to 2. ~

bgobbi · August 14, 2017, 11:11pm

It fails in the first cycle for me on both the b3lyp-d2 and the m06-2x:

gist.github.com

https://gist.github.com/bgobbi/5e0025e98b6ab48617357f1fda9367d4

b3lypd2.dat

optMethod = 'B3LYP-D2'
optBasis = '6-31g*'
finalMethod = 'mp2'
finalBasis = 'aug-cc-pv[dt]z'

import time
t1 = time.time()
def saveFile(name, string):
   f=open(name,'w')
   f.write(string)

This file has been truncated. show original

b3lypd2.out


    -----------------------------------------------------------------------
          Psi4: An Open-Source Ab Initio Electronic Structure Package
                               Psi4 1.2a1.dev375 

                         Git: Rev {keep-records} 6de7c17 


    R. M. Parrish, L. A. Burns, D. G. A. Smith, A. C. Simmonett,
    A. E. DePrince III, E. G. Hohenstein, U. Bozkaya, A. Yu. Sokolov,

This file has been truncated. show original

m06.out


    -----------------------------------------------------------------------
          Psi4: An Open-Source Ab Initio Electronic Structure Package
                               Psi4 1.2a1.dev375 

                         Git: Rev {keep-records} 6de7c17 


    R. M. Parrish, L. A. Burns, D. G. A. Smith, A. C. Simmonett,
    A. E. DePrince III, E. G. Hohenstein, U. Bozkaya, A. Yu. Sokolov,

This file has been truncated. show original

There are more than three files. show original

You mentioned this could be a hardware issue. Can you elaborate!

Thanks for looking at this!!!
Alberto

bgobbi · August 15, 2017, 12:19pm

This definitely seems to be a problem with our compilation of psi4. I downloaded the binary psi4 1,1 build and tried that. It runs fine.

Would you have any advice where to start looking for the issue?

We need a custom compilation because of:

Thanks again!
Alberto

bgobbi · August 15, 2017, 10:53pm

Could this be a difference between the official 1.1 psi4 release and the one we have build downloaded from GitHub?
Psi4 1.2a1.dev375 Git: Rev {keep-records} 6de7c17

Can you confirm which version you used for your successful b3lyp-d2 run?

Thanks,
Alberto

bgobbi · August 28, 2017, 11:00pm

We went back and cloned the 1.1 ps4 release from github and recompiled with identical params than our 1.2A version (6de7c17). That version of Psi4 runs the test case with no issues.

This suggests that the conversion problems we are seeing are due to code changes introduced after the 1.1 release. I also noticed that the output file contains some differences in the SCF portion (e.g. an additional outer was added Daniel Smith).

The new SCF procedure doesn’t seem quite as stable as the old one.
I looked to recent commit messages and did not find much in regards of comments that might affect the SCF procedure.

dgasmith · August 29, 2017, 6:32pm

The SCF procedures’ convergence patterns have not been changed. The primary SCF change has been in SOSCF (if requested) and the XC functionals have been reworked to use LibXC instead of our own custom code.

Which LAPACK are you using in your compilations? We have had significant issues in non-MKL diagonalizers in the past.

BTW once you flip into a positive energy state you are unlikely to ever return to something reasonable.

bgobbi · September 2, 2017, 6:59pm

The interesting think is that both our 1.1 and 1.2 versions are build with the same parameters and libraries: “Intel MKL libraries” but the 1.1 version works on the example while the 1.2 version does not.
They were build with the 2015 version and we checked that this is the version that is loaded during the trials. We can try a newer version if you feel that that will make a difference. This would still not explain why the 1.1 version converges wile the 1.2 does not.

Thanks for following up!
Alberto

bgobbi · October 19, 2017, 10:23pm

Hi,
I did some more research and found that the issue is hardware dependent.
I tried the same input file on two different hardwares:
(A) model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
(B) model name : Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz

I now have an example input file that works on (A) but not on (B).
I also have a different examples file that works on (B) but not on (A). Both convergence failures are reproducible on the corresponding hardware.

Does this help to pinpoint what the issue could be?

I also read this post:

github.com/psi4/psi4

Orientationmgr diag

psi4:master ← lazaroid:orientationmgr_diag

opened 02:21PM - 07 Apr 17 UTC

lazaroid

+10 -6

## Description This cleans up a matrix before diagonalisation by removing extre…mely small values arising from numerical accuracy during matrix generation. Diagonalisation of matrices that have off-diagonal elements of ca <10-14 with DSYEV appears to lead to unexpected eigenvalues, which ultimately lead to different DFT grid generation on different processor types that show subtly different behaviour with numerical accuracy. Ultimately, this enables the dftd3-version test to pass on core i7-5930X processors (all Haswell?) where before it failed due to generation of an odd DFT grid compared with other processors. ## Todos Notable points that this PR has either accomplished or will accomplish. * **Developer Interest** - I don't think this should affect anything other than within OrientationMgr but the main developers will know better! * **User-Facing for Release Notes** - None ## Status - [ ] Ready to go

Could this be related?
The version of psi four we are using contains the patch mentioned. However I’m wondering, if the same problem with near zero values in not exactly an relation could happen in other places inside the code.

Thanks again for looking at this!
Alberto