Single-point energy calculation (vs. Gaussian)

I am comparing psi4 to Gaussian16 and finding that psi4 run MUCH slower than Gaussian.

The goal is to do a single-point energy calculation of a 118-atom organic molecule generated by the RDKit software:

def mol2xyz(mol):
  mol = Chem.AddHs(mol)
  AllChem.EmbedMolecule(mol, useExpTorsionAnglePrefs=True,useBasicKnowledge=True)
  AllChem.UFFOptimizeMolecule(mol)
  atoms = mol.GetAtoms()
  string = "\n"
  for i, atom in enumerate(atoms):
    pos = mol.GetConformer().GetAtomPosition(atom.GetIdx())
    string += "{} {} {} {}\n".format(atom.GetSymbol(), pos.x, pos.y, pos.z)
  string += "units angstrom\n"
  return string, mol

mol = Chem.MolFromSmiles('C[C@@H]1[C@H]([C@H](C[C@@H](O1)O[C@@H]2[C@H](O[C@H](C[C@@H]2O)O[C@@H]3[C@H](O[C@H](C[C@@H]3O)O[C@H]4CC[C@]5([C@@H](C4)CC[C@@H]6[C@@H]5CC[C@]7([C@@]6(CC[C@@H]7C8=CC(=O)OC8)O)C)C)C)C)O)O')

xyz, mol=mol2xyz(mol)
psi4.set_memory('8 GB')
psi4.set_num_threads(12)
scf_e, scf_wfn = psi4.energy("M06-2X/6-311+G(d,p)", return_wfn=True)

The same xyz coordinates were used in Gaussian with the following command.
%Mem=4GB
%Nproc=12
#sp m062x/6-311+G(d,p) opt=modredundant integral=ultrafine

Both were run on a Z440 HP with 12 processors available through Intel Xeon E5-1650 v3 (3.5 GHz).

The Gaussian run finished in 12 min 20.4 sec of CPU-time.

The PSI4 has been running for over 3.5 hrs of CPU-time and is on @DK-RKS iter 4.
The delta is around -4.7e-2.

What is the problem, and is there a way to speed up the single-point energy calculation so it comes closer to the speed achieved by Gaussian16?

Could you let us know the psi4 version?
Has a fast scratch directory been specified? (PSI4 does more IO for this calc than gaussian)

It is psi4 from github: ecbda834b v1.3.2 (only a few weeks ago).

I did not specify a fast scratch library. How should one set that?

scratch dirs can be specified with
psi4.core.IOManager.shared_object().set_default_path('/path/')
or: with the envar PSI_SCRATCH=<dir_name>
or, if used in psithon mode: psi4 -s <dir_name>

PSI4 also really likes a lot of memory for the SCF iterations with density fitting.
And it needs a fast disk. The DF integrals are written to disk and will need to be read for each SCF.

v.1.3.2 is the last official release. Upcoming versions v1.4 has various improvements to make DFT calculations faster. Most are included already in the developer version obtainable also from conda.

However, I am a bit surprised by the 12min of Gaussian. I tried to run your calculations and Gaussian09 crashed while using opt=modredundant. No SCF was started. Did you check if the calculation finished normally?

Last note, PSI4’s default grid is smaller than Gaussian’s ultrafine.

Curious. I was using Gaussian16, so I am not sure if Gaussian09 lacks certain defaults that were updated in g16.
Whatever the case, the g16 sp calculation finished ‘normally’, though it did state that “This type of calculation cannot be archived.”

Is it possible to adjust the DFT grid to make it less fine than the default? Would that speed up the calculation to a degree that it would match Gaussian?

The end goal with all of this work is to have a means to acquire Mulliken charges on each of the atoms in the molecule being analyzed, preferably at the level of theory: 6-31+G(2d,p).

Oh, I meant ‘smaller’ in the sense of having less grid points, i.e. more coarse. ‘Smaller’ is an ambiguous word here sorry.

12min is really fast for a hybrid DFT calculation on100+ atoms with a TZ basis set! So this made me suspicious. Unless GPUs or something are used.

I also get this Gaussian message in the crashed job, btw. The Harris guess is done and before the SCF starts something happens and ‘some’ density is copied and used for the final population analysis. I’d assume the guess density. Certainly no SCF was done! So no hard crash, but something went wrong.
Please carefully check. Using #P will make it clearer, I think. Not a Gaussian expert though :slight_smile: