DF-OMPx segfaults depending on system size

Dear all,
i am trying to optimise a few small molecules, the largest being glycine, some of then being polyradicals, using Psi4 and preferably OO-MP2.5 with an ROHF reference. The OMP2.5 example from the test suite runs perfectly fine. Both the original example crash when increasing the basis set to cc-pV5Z, as well as my glycine calculations crashes with cc-pVTZ. Inputfile below:

memory 100 Gb

molecule Glycine {
0 1
  N          1.93070        0.08960       -0.03440
  C          0.76110       -0.79880       -0.00800
  C         -0.49840        0.02880       -0.00510
  O         -0.42880        1.23490       -0.02310
  O         -1.69750       -0.57400        0.01760
  H          1.90950        0.73840        0.73810
  H          2.78840       -0.44180       -0.03680
  H          0.77220       -1.44040       -0.88910
  H          0.79300       -1.41500        0.89060
  H         -2.47670       -0.00150        0.01850

set {
  basis cc-pvtz
  df_basis_scf cc-pvtz-jkfit
  df_basis_cc cc-pvtz-ri
  scf_type df
  freeze_core true
  mp_type df



    Psi4 started on: Thursday, 05 November 2020 12:19PM

    Process ID: 24359
    Host:       rlyeh
    PSIDATADIR: /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4
    Memory:     500.0 MiB
    Threads:    4
  Memory set to  93.132 GiB by Python driver.
gradient() will perform analytic gradient computation.
    Method 'OMP2.5' requires SCF_TYPE = DISK_DF, setting.

*** tstart() called on rlyeh
*** at Thu Nov  5 12:19:41 2020

   => Loading Basis Set <=

    Name: CC-PVTZ
    Role: ORBITAL
    Keyword: BASIS
    atoms 1    entry N          line   224 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz.gbs 
    atoms 2-3  entry C          line   186 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz.gbs 
    atoms 4-5  entry O          line   262 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz.gbs 
    atoms 6-10 entry H          line    23 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz.gbs 

               by Justin Turney, Rob Parrish, Andy Simmonett
                          and Daniel G. A. Smith
                              RHF Reference
                        4 Threads,  95367 MiB Core

  ==> Geometry <==

    Molecular point group: c1
    Full point group: C1

    Geometry (in Angstrom), charge = 0, multiplicity = 1:

       Center              X                  Y                   Z               Mass       
    ------------   -----------------  -----------------  -----------------  -----------------
         N            1.930777610500     0.089528296316    -0.034400885216    14.003074004430
         C            0.761177610500    -0.798871703684    -0.008000885216    12.000000000000
         C           -0.498322389500     0.028728296316    -0.005100885216    12.000000000000
         O           -0.428722389500     1.234828296316    -0.023100885216    15.994914619570
         O           -1.697422389500    -0.574071703684     0.017599114784    15.994914619570
         H            1.909577610500     0.738328296316     0.738099114784     1.007825032230
         H            2.788477610500    -0.441871703684    -0.036800885216     1.007825032230
         H            0.772277610500    -1.440471703684    -0.889100885216     1.007825032230
         H            0.793077610500    -1.415071703684     0.890599114784     1.007825032230
         H           -2.476622389500    -0.001571703684     0.018499114784     1.007825032230

  Running in c1 symmetry.

  Rotational constants: A =      0.37904  B =      0.12745  C =      0.09779 [cm^-1]
  Rotational constants: A =  11363.40855  B =   3820.93208  C =   2931.70123 [MHz]
  Nuclear repulsion =  179.722680045898613

  Charge       = 0
  Multiplicity = 1
  Electrons    = 40
  Nalpha       = 20
  Nbeta        = 20

  ==> Algorithm <==

  SCF Algorithm Type is DISK_DF.
  DIIS enabled.
  MOM disabled.
  Fractional occupation disabled.
  Guess Type is SAD.
  Energy threshold   = 1.00e-10
  Density threshold  = 1.00e-10
  Integral threshold = 0.00e+00

  ==> Primary Basis <==

  Basis Set: CC-PVTZ
    Blend: CC-PVTZ
    Number of shells: 80
    Number of basis function: 220
    Number of Cartesian functions: 250
    Spherical Harmonics?: true
    Max angular momentum: 3

   => Loading Basis Set <=

    Role: JKFIT
    Keyword: DF_BASIS_SCF
    atoms 1    entry N          line   177 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 2-3  entry C          line   125 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 4-5  entry O          line   229 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 6-10 entry H          line    51 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 

  ==> Pre-Iterations <==

    Irrep   Nso     Nmo     Nalpha   Nbeta   Ndocc  Nsocc
     A        220     220       0       0       0       0
    Total     220     220      20      20      20       0

  ==> Integral Setup <==

  ==> DiskDFJK: Density-Fitted J/K Matrices <==

    J tasked:                  Yes
    K tasked:                  Yes
    wK tasked:                  No
    OpenMP threads:              4
    Integrals threads:           4
    Memory [MiB]:            71525
    Algorithm:                Core
    Integral Cache:           SAVE
    Schwarz Cutoff:          1E-12
    Fitting Condition:       1E-10

   => Auxiliary Basis Set <=

  Basis Set: CC-PVTZ-JKFIT
    Blend: CC-PVTZ-JKFIT
    Number of shells: 175
    Number of basis function: 545
    Number of Cartesian functions: 655
    Spherical Harmonics?: true
    Max angular momentum: 4

  Minimum eigenvalue in the overlap matrix is 4.9822479478E-04.
  Using Symmetric Orthogonalization.

  SCF Guess: Superposition of Atomic Densities via on-the-fly atomic UHF.

  ==> Iterations <==

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RHF iter SAD:  -282.02746472735748   -2.82027e+02   0.00000e+00 
   @DF-RHF iter   1:  -282.74106179030343   -7.13597e-01   3.00435e-03 DIIS
   @DF-RHF iter   2:  -282.87273451650037   -1.31673e-01   1.90948e-03 DIIS
   @DF-RHF iter   3:  -282.93101879831977   -5.82843e-02   2.94424e-04 DIIS
   @DF-RHF iter   4:  -282.93359586564509   -2.57707e-03   1.44639e-04 DIIS
   @DF-RHF iter   5:  -282.93414926428159   -5.53399e-04   3.05379e-05 DIIS
   @DF-RHF iter   6:  -282.93420078323413   -5.15190e-05   1.35728e-05 DIIS
   @DF-RHF iter   7:  -282.93421069016250   -9.90693e-06   5.39384e-06 DIIS
   @DF-RHF iter   8:  -282.93421259878312   -1.90862e-06   1.95904e-06 DIIS
   @DF-RHF iter   9:  -282.93421285632030   -2.57537e-07   6.06211e-07 DIIS
   @DF-RHF iter  10:  -282.93421287401321   -1.76929e-08   1.67904e-07 DIIS
   @DF-RHF iter  11:  -282.93421287544544   -1.43223e-09   3.97519e-08 DIIS
   @DF-RHF iter  12:  -282.93421287552883   -8.33893e-11   9.54675e-09 DIIS
   @DF-RHF iter  13:  -282.93421287553360   -4.77485e-12   3.39278e-09 DIIS
   @DF-RHF iter  14:  -282.93421287553417   -5.68434e-13   1.35274e-09 DIIS
   @DF-RHF iter  15:  -282.93421287553440   -2.27374e-13   4.02586e-10 DIIS
   @DF-RHF iter  16:  -282.93421287553429    1.13687e-13   1.50383e-10 DIIS
   @DF-RHF iter  17:  -282.93421287553468   -3.97904e-13   6.07236e-11 DIIS
  Energy and wave function converged.

  ==> Post-Iterations <==

    Orbital Energies [Eh]

    Doubly Occupied:                                                      

       1A    -20.608341     2A    -20.535926     3A    -15.542253  
       4A    -11.382855     5A    -11.280014     6A     -1.461527  
       7A     -1.353137     8A     -1.184056     9A     -0.975629  
      10A     -0.827682    11A     -0.718057    12A     -0.691156  
      13A     -0.666137    14A     -0.623460    15A     -0.598939  
      16A     -0.575812    17A     -0.550825    18A     -0.473835  
      19A     -0.447806    20A     -0.397546  


      21A      0.134969    22A      0.147895    23A      0.169425  
      24A      0.186525    25A      0.205403    26A      0.229613  
      27A      0.313983    28A      0.328585    29A      0.368083  
      30A      0.408930    31A      0.455796    32A      0.479343  
      33A      0.494121    34A      0.520269    35A      0.537970  
      36A      0.572504    37A      0.583964    38A      0.601349  
      39A      0.633137    40A      0.645938    41A      0.674688  
      42A      0.690892    43A      0.702856    44A      0.713081  
      45A      0.743081    46A      0.755791    47A      0.760875  
      48A      0.793356    49A      0.809175    50A      0.815401  
      51A      0.881896    52A      0.909113    53A      0.920254  
      54A      0.938482    55A      0.966409    56A      1.007081  
      57A      1.054578    58A      1.105159    59A      1.109671  
      60A      1.145475    61A      1.180600    62A      1.227506  
      63A      1.240429    64A      1.296887    65A      1.324462  
      66A      1.344300    67A      1.361081    68A      1.374079  
      69A      1.416591    70A      1.435559    71A      1.472576  
      72A      1.498750    73A      1.528192    74A      1.551167  
      75A      1.581189    76A      1.629604    77A      1.697271  
      78A      1.715808    79A      1.781558    80A      1.805008  
      81A      1.843313    82A      1.889949    83A      1.938083  
      84A      1.973746    85A      2.003979    86A      2.024608  
      87A      2.043698    88A      2.082345    89A      2.187261  
      90A      2.284346    91A      2.306811    92A      2.328120  
      93A      2.405679    94A      2.461227    95A      2.484137  
      96A      2.561457    97A      2.629775    98A      2.649715  
      99A      2.775254   100A      2.788634   101A      2.820644  
     102A      2.857894   103A      2.937442   104A      2.957692  
     105A      3.021392   106A      3.082757   107A      3.111113  
     108A      3.147281   109A      3.170310   110A      3.216044  
     111A      3.240716   112A      3.251188   113A      3.286100  
     114A      3.314067   115A      3.333173   116A      3.378029  
     117A      3.419750   118A      3.457125   119A      3.466550  
     120A      3.495821   121A      3.542949   122A      3.571573  
     123A      3.586913   124A      3.619622   125A      3.632050  
     126A      3.664881   127A      3.678680   128A      3.698141  
     129A      3.726315   130A      3.758994   131A      3.780569  
     132A      3.804711   133A      3.839745   134A      3.878493  
     135A      3.891821   136A      3.916949   137A      3.958263  
     138A      3.975063   139A      4.035750   140A      4.054506  
     141A      4.075967   142A      4.132731   143A      4.141964  
     144A      4.172174   145A      4.190548   146A      4.200872  
     147A      4.221355   148A      4.231293   149A      4.284816  
     150A      4.316802   151A      4.357843   152A      4.409250  
     153A      4.437882   154A      4.470179   155A      4.524341  
     156A      4.557737   157A      4.575852   158A      4.595889  
     159A      4.621370   160A      4.645591   161A      4.690992  
     162A      4.727411   163A      4.748128   164A      4.787188  
     165A      4.840925   166A      4.938659   167A      5.065267  
     168A      5.079834   169A      5.119854   170A      5.214028  
     171A      5.258748   172A      5.309188   173A      5.370886  
     174A      5.404427   175A      5.465507   176A      5.506148  
     177A      5.524372   178A      5.611127   179A      5.632274  
     180A      5.635968   181A      5.659728   182A      5.686195  
     183A      5.768214   184A      5.826517   185A      5.891272  
     186A      5.902880   187A      5.973956   188A      6.006412  
     189A      6.143255   190A      6.154039   191A      6.224281  
     192A      6.256356   193A      6.324868   194A      6.375658  
     195A      6.453218   196A      6.540576   197A      6.567897  
     198A      6.612641   199A      6.676607   200A      6.706320  
     201A      6.861398   202A      6.873752   203A      6.979995  
     204A      7.002893   205A      7.038804   206A      7.144053  
     207A      7.265591   208A      7.281912   209A      7.311467  
     210A      7.400190   211A      7.489877   212A      7.575423  
     213A      7.684358   214A      7.770131   215A      7.997432  
     216A      9.635556   217A     12.826221   218A     13.048155  
     219A     13.735556   220A     13.809414  

    Final Occupation by Irrep:
    DOCC [    20 ]

  @DF-RHF Final Energy:  -282.93421287553468

   => Energetics <=

    Nuclear Repulsion Energy =            179.7226800458986133
    One-Electron Energy =                -743.5238023865803143
    Two-Electron Energy =                 280.8669094651469891
    Total Energy =                       -282.9342128755346835

Computation Completed

Properties will be evaluated at   0.000000,   0.000000,   0.000000 [a0]

Properties computed using the SCF density matrix

  Nuclear Dipole Moment: [e a0]
     X:     3.5342      Y:    -2.3976      Z:     0.6763

  Electronic Dipole Moment: [e a0]
     X:    -3.5589      Y:     1.4249      Z:    -0.2499

  Dipole Moment: [e a0]
     X:    -0.0247      Y:    -0.9727      Z:     0.4264     Total:     1.0623

  Dipole Moment: [D]
     X:    -0.0628      Y:    -2.4724      Z:     1.0837     Total:     2.7002

*** tstop() called on rlyeh at Thu Nov  5 12:19:44 2020
Module time:
	user time   =      11.16 seconds =       0.19 minutes
	system time =       0.27 seconds =       0.00 minutes
	total time  =          3 seconds =       0.05 minutes
Total time:
	user time   =      11.16 seconds =       0.19 minutes
	system time =       0.27 seconds =       0.00 minutes
	total time  =          3 seconds =       0.05 minutes
  Constructing Basis Sets for DFOCC...

   => Loading Basis Set <=

    Role: JKFIT
    Keyword: DF_BASIS_SCF
    atoms 1    entry N          line   177 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 2-3  entry C          line   125 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 4-5  entry O          line   229 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 
    atoms 6-10 entry H          line    51 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-jkfit.gbs 

   => Loading Basis Set <=

    Name: CC-PVTZ-RI
    Role: RIFIT
    Keyword: DF_BASIS_CC
    atoms 1    entry N          line   257 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-ri.gbs 
    atoms 2-3  entry C          line   209 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-ri.gbs 
    atoms 4-5  entry O          line   305 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-ri.gbs 
    atoms 6-10 entry H          line    19 file /nix/store/bwqqv49py6m6vxdbhjn8bk9l3wa60kcy-psi4-1.3.2/share/psi4/basis/cc-pvtz-ri.gbs 

*** tstart() called on rlyeh
*** at Thu Nov  5 12:19:44 2020

                    DF-OMP2.5 (DF-OO-MP2.5)   
              Program Written by Ugur Bozkaya
              Latest Revision September 9, 2017


	RMS orbital gradient is changed to :     2.51e-06
	MAX orbital gradient is changed to :     1.05e-04
	MO spaces... 

	 FC   OCC   VIR   FV 
	  5   15   200    0

And the segfault:

/tmp/nix-shell-24339-0/rc: line 1: 24359 Segmentation fault      (core dumped) psi4 -n 4 -i Glycine_OOMP2-5_OPT.psi -o Glycine_OOMP2-5_OPT.out

I did not expect this setup to be too large. There is something similar in the issue tracker but the example is much larger and proceeds to print memory estimations, while my example crashes before.
Is this related to the int definition problem mentioned there or do i miss something else? :slight_smile:

Best wishes and thank you in advance

Thanks for the report! This is a perfectly reasonable computation to run, and I would not expect Issue 1764 to occur with that basis set size.

I have a few other things on my to-do list, but I’ll look into this.

If you don’t mind my curiosity, since orbital-optimization is a research interest of mine, what are you using OMP2.5 for?

Hey, thank you for looking into it! :slight_smile: If i should provide more data let me now. My Psi4 is compiled from source, not the Conda installation. But the tests pass, so i guess this is fine.

I want to study the barrier-free formation of glycine in the interstellar medium (
C + H2 -> CH2
CH2 + CO2 -> H2C-CO2
H2C-CO2 + NH3 -> H3N-CH2-CO2
This involves a lot of open-shell species obivously and it seems that the orbital optimised methods provide much better results for radicals. I would just like to get the geometries and some energy estimates for the stationary points from OO-MP2.5, before proceeding to scan the reactions path using some multireference methods, probably MRCI.
As i am doing this with two research students, having some easier single-reference methods at hand for the beginning would be nice, although the reaction itself is nasty and likely proceeds through some conical intersections.

Just realised that this only happens when doing a parallel run. A single thread is perfectly fine. :open_mouth: I am using OpenBLAS as BLAS implementation, could this be some threading problem?

I assume it’s gcc+openblas?
That is not well supported and various threading issues might show (nested threading oversubscribing, not good threading, incompatibility with numpy threading, … ).

Can you check if the threading test is healthy? https://github.com/psi4/psi4/blob/master/psi4/share/psi4/scripts/test_threading.py
Especially nested threading (calling BLAS from openmp loop) can be problematic. Not only does Openblas needs to have openmp compatibility enabled, but interoperability between threading libraries can be troubling. E.g. check with top if you have NxN threads instead of N.

It is best to use intel+mkl or the conda binaries. The conda binaries are fully optimized and really fast.
For AMD you might need to trick the MKL used in psi4 v1.3.2 from conda into using AVX2.

Hi hokru,
indeed, OpenBLAS and GCC. I am using the Nix package manager for everything here (including python package management. Numpy is also built with OpenBLAS), as we have some programs which wrap many quantum chemistry packages and at the point where we get conflicting MPI and BLAS implementations in the PATH it becomes a nightmare without Nix, so Conda is not really an option. I also would like to try a few things on an ARM server, so MKL and Intel ICC are also not really an option. :grimacing:

The threading test is actually fine with my OpenBLAS+GCC configuration and htop shows reasonable CPU load at least during SCF iterations. Nevertheless, i just figured out i can also get the jobs running with

OMP_NUM_THREADS=1 psi4 -i $INPUT -o $OUTPUT -n 4

and the CPU load is still around 400%, so this looks at least like a viable workaround.

OK, sounds good.
psi4 itself ignores OMP_NUM_THREADS, so that’s interesting.

There are also envs like OMP_NESTED=FALSE that could be tried.
Also openblas flags could be tried.

I too would like better gcc and gcc+openblas (or bliss) support, but it will require cmake changes
and I have seen C++ parts that should run threaded but did not.
Thus currently, at least performance-wise, intel+mkl is advised.

Btw, if you dont want to use psi4 as a python module, it is fine to directly call the psi4 binary located anywhere (e.g from an inactive conda directory). At least it didnt cause issues for me yet.

There is a bug in Psi4 1.3.2 involving DF-OMP2.5, where the correlation energy can be off by SCF energy - REF energy, depending on details of your convergence. I strongly recommend you use a Psi4 version including pull request 1772 (https://github.com/psi4/psi4/pull/1772), such as the current developer version.

I cannot reproduce this bug on Psi4 1.3.2 on my cluster. This supports the idea that this is related to details of your environment, like OpenBLAS and OpenMP.

Thank you both for your hints. I am using a current version from master, built with MKL and GCC. This solves the threading and segfault problem and indeed gives a different behaviour for OO-MP2.5 compared to Psi4 1.3.2. :slight_smile:

Good to hear! I’m going to mark this issue as solved.

