UKS Optimisation Speed vs Gaussian

johal345 · April 28, 2022, 9:13am

Hi all,

I’m a new user to PSI4 using it as a python module and was wondering if there is anything I can do to speed up the calculations other than using more cores. I’ve attached my code below to calculate reorganisation energies and have been testing on a UFF optimised pentacene molecule with 20 cores.

Gaussian09 can complete the job in 11 minutes whilst psi4 takes over 2 hours, with essentially all the extra time due to optimising the pentacene at a -1 charge and multiplicity of 2.

Any help on this would be greatly appreciated.

Thank you
Jay

psi4_reorg.py (2.4 KB)

jmisiewicz · April 28, 2022, 10:33am

We’d need output files from Psi at least and preferably from Gaussian as well. You should have a timer.dat file, and that would also help. If you aren’t converging your molecules to the same tightness, that would explain it immediately.

My naive guess is that Gaussian requires fewer geometry steps to optimize the anion, but again, I can’t tell without two output files to compare.

I’ll add that we learned (literally yesterday) about a part of the SCF gradient code with negative threading efficiency due to unintended extra work. Once that’s fixed, the Psi timings will need to be updated.

johal345 · April 28, 2022, 11:17am

Thank you for your reply. I have attached all the outputs for the psi4_reorg.py script, the main one is the opt_-1_out.txt for the anion optimisation. Additionally attached is the matching Gaussian09 for the anion optimisation.

One other question that has come to mind is that B3LYP implementation in Gaussian I believe is different to other programs, would that have any impact here?

All the uploaded files original file extensions are as underscores in the name and saved as .txt to upload here. The slurm ouput is for the psi4 calculation.

Cheers
Jay
energy_0_out.txt (33.0 KB)
energy_-1_out.txt (33.0 KB)
opt_0_out.txt (149.3 KB)
opt_-1_out.txt (907.5 KB)
PENCEN_opt_STAGE_2_com.txt (1.6 KB)

PENCEN_opt_STAGE_2_log.txt (332.1 KB)
slurm-1366728_out.txt (18.9 KB)
timer.dat (9.1 KB)

loriab · April 28, 2022, 3:29pm

Thanks for the report!

Below are some notes and observations. That’s a dramatic difference you found and still needs more investigation.

number of optimization cycles is a factor, as Gaussian is doing 4 and Psi 13. Psi is matching Gaussian final forces at about cycle 7 but is using the full 13 to match final statistics.
grid and energy/density convergence are also likely tighter with Psi by default. The Psi grid is (75,302), while I’m not finding the Gaussian spec right off.
The B3LYP in Psi4 will align with Gaussian (see psi4/input.dat at master · psi4/psi4 · GitHub). One would call energy("b3lyp5") in Psi to match the other definition.

  --------------------------------------------------------------------------------------------------------------- ~
   Step         Total Energy             Delta E       MAX Force       RMS Force        MAX Disp        RMS Disp  ~
  --------------------------------------------------------------------------------------------------------------- ~
      1    -847.039051673552   -847.039051673552      0.01357514      0.00527445      0.06213815      0.02221764  ~
      2    -847.041082096608     -0.002030423057      0.00699603      0.00273844      0.01774690      0.00592831  ~
      3    -847.041511673654     -0.000429577046      0.00201995      0.00079861      0.00706052      0.00279724  ~
      4    -847.041559956526     -0.000048282872      0.00062019      0.00024140      0.14230438      0.04811252  ~
      5    -847.041557459142      0.000002497384      0.00047183      0.00015551      0.15679806      0.04811260  ~
      6    -847.041439727013      0.000117732129      0.00187347      0.00078959      0.01841044      0.00768110  ~
      7    -847.041567548274     -0.000127821260      0.00008311      0.00003505      0.15739598      0.04811252  ~
      8    -847.041515548660      0.000051999614      0.00051431      0.00016356      0.15175610      0.03876663  ~
      9    -847.041565525892     -0.000049977232      0.00015835      0.00004188      0.19768720      0.04811339  ~
     10    -847.041559803502      0.000005722390      0.00044667      0.00011993      0.12334127      0.04811256  ~
     11    -847.041566187349     -0.000006383847      0.00021038      0.00006993      0.03015416      0.01202814  ~
     12    -847.041567773651     -0.000001586302      0.00007434      0.00002794      0.00736246      0.00300704  ~
     13    -847.041568051024     -0.000000277373      0.00003814      0.00001160      0.00259013      0.00075181  ~
  --------------------------------------------------------------------------------------------------------------- ~

johal345 · April 28, 2022, 3:58pm

Hi, Thank you for your reply. I just realised I reported the wrong timings for my scripts.

For pentacene to go through my psi4_reorg.py script took 1:27:07 not over 2 hours as I previously stated on 20 cores, still compared to the 11 minutes the version of my script that calls Gaussian09 for the calculations. Sorry for being misleading in that regard, a different molecule took over 2 hours (2 hours 50 minutes) with psi4 and 38 minutes with Gaussian09, but that was one made using a generative method as part of my workflow.

However the same optimisation of the -1 charged mult 2 pentacene did take 68 minutes using psi4 compared to 6 minutes for Gaussian09 so it was still very slow comparatively.

AlexanderH · April 28, 2022, 8:23pm

Hi!

Two things to add to Lori’s notes.

The memory used in opt_-1_out.txt is the default 500 MB. I think psi4 is just a little starved for memory here.
We recommend running optimizations in internal coordinates if possible. There are cases where a torsion composed of two linear bends will fail and Cartesians are required, but its usually best to optimize in internals until that is encountered and then switch.

I ran your system quick on my old laptop with those two changes. The time is down to 11 minutes and the geometry converged in 4 steps. Here’s the input file I used. Hopefully on your hardware the comparison will get still better.

output.dat (332.4 KB)

johal345 · April 29, 2022, 3:56pm

Thanks for the suggestion, I increased the memory as suggested to the maximum on my HPC and by testing with the internal coordinates the time has dropped to 15 minutes now on 20 cores. I was using cartesian as I had encountered the stated issue with other molecules I was testing but will do as you suggested.

Is there an efficiency difference between using psi4 as an executable writing the input in the Psithon format versus importing psi4 as a python package which is currently what I am doing?

I will look at the grid and convergence criteria to try improve timings further.

Jay
PENCEN_uff.xyz (1.8 KB)
psi4_reorg_just-1.py (2.6 KB)
slurm-1374462_out.txt (332.6 KB)
test_sh.txt (190 Bytes)
timer_dat.txt (8.4 KB)

hokru · April 30, 2022, 7:26am

No. The same C++ backend is used.

The C++ optking code has several issues and many users encounter problems with internal coordinates.
The best solution is adopting the python-rewrite GitHub - psi-rking/optking: optking: A Python version of the PSI4 geometry optimization program by R.A. King

The G16 default grid is a pruned 590/99 (UltraFine), it’s tighter than psi4’s default, in psi4 setting this comes close (different pruning):

set {
dft_radial_points 99
dft_spherical_points 590
dft_pruning_scheme robust
}

More memory helps psi4 a lot.