Geometry Optimisation with CCSD

mmithun · June 14, 2023, 10:14am

Hi,
I am attaching some inputs files and output files of calculations with ccsd. It is taking long time compared to other similar molecules. It will nice if you can check the input file and output file and advise me why cytosine is taking longer time than guanine and likewise HCOOHvsHCN is taking time than HCOOHvsH2O. Note: The calculations are still running so the output files (cytosine.out and HCOOHvsHCN.log) are not complete. Also please see the optimisation of a dimer (dimer-in.txt and dimer-log.dat), if I can improve the input for faster calculation and I am not doing anything wronge. I understand that cc methods are costly and time consuming. All the files are in dropbox.
Thanks in advance.

files

jmisiewicz · June 14, 2023, 1:51pm

This is expected behavior. If you give CCSD a system with N orbitals, the expected computed time is expected to be N^6. HCOOH-HCN dimer has more orbitals than HCOOH-H2O dimer, and guanine has significantly more orbitals than cytosine.

For optimized ways to use Psi’s standard CCSD code, I defer to @crawdad.

In your case, consider setting cc_type df to use the density-fitting approximation. Although this will give you overall different energies and different geometries compared to standard CC, the differences are expected to be very small.

crawdad · June 14, 2023, 2:17pm

A few observations/comments:

I agree that the cytosine calculation should be faster than guanine because the former is significantly smaller than the latter. (MOs = 137 vs. 179, occupied MOs = 29 vs. 39). However, the problem isn’t with the CCSD gradient code, but the geometry optimization. Your guanine optimization converged in 31 steps (which is too many, in my opinion), and your cytosine optimization is still running after 127 steps and the total gradient is still quite large (RMS force is 5.98e-03). Where did your starting structures come from?
You should definitely not be using density-fitting for the SCF because the CC codes you’re using were not designed for that. Indeed, I wouldn’t trust any of the CC results you have because the errors in the DF MOs are likely causing problems with the CC gradients you’re computing because the orbital response part of the gradient doesn’t account for the DF parameters. It is possible that this is ultimately the source of the problems you’re encountering with the optimizations. Change to scf_type pk and give it another shot.

kalju · June 15, 2023, 6:42am

In addition to what was being said above:

I don’t think it is an issue with the starting structure. Sure, a nonplanar guanine looks strange to a good biochemist, but there are plenty of ‘model chemistries’ that say that something like this is a minimum for guanine in the gas phase. ( “The small planarization barriers for the amino group in the nucleic acid bases”)
I am not sure that the choice “opt_coordinates cartesian” was a wise one here. I find that the default optimizer in modern versions of Psi4 works well in most cases. Cartesian coordinates sometimes come handy if the molecule has a linear segment (alkyne, for example) somewhere.
If this is not a “we need to see how bad the CCSD/cc-VDZ result is” project, I would give DF OLCCD a try. This allows to use much faster DF integrals, and the gradient calculation is essentially free. Your starting structure for guanine converged to a non-planar minimum in 8 steps while I was writing this post. Optimizing guanine at a much more meaningful DF OLCCD/aug-cc-pVTZ level should be perfectly doable on a modern hardware in less than a week.
If one shall not use DF SCF with CCLAMBDA/CCDENSITY, would it not be a good idea to add such a warning into the program code? For example, if one tries to do the reverse (PK SCF and DF OCC gradient), user gets a stern note:

! DFOCC gradients need DF-SCF reference. !

Hope this helps!

jmisiewicz · June 15, 2023, 2:45pm

It’s a good goal. The problem is practical implementation. If a user sends in a reference wavefunction, I have no way of knowing what approximations were used on the two-electron integrals to solve SCF. That’s exactly the information you need for this kind of validation correctly. (The DFOCC check is implemented badly, and as a result, a user can bypass it.) Off the top of my head, there’s conventional, density-fitting, and cholesky decomposition. There may be more, but I don’t know the other JK algorithms well enough to comment.

I’ve added this as a developer wishlist item. I could conceivably do a crude check that will intercept most cases, but the cc module has some DF capabilities, and I would need to understand the logic for when those fire before implementing this.

mmithun · June 15, 2023, 3:31pm

I did it with the geometry from B3lYP/cc-pvdz level of theory. Can you also advise a better way to do optimisation of dimers and single molecules with ccsd.

mmithun · June 15, 2023, 3:32pm

Thanks for the suggestions. Will try it for sure.

kalju · June 15, 2023, 4:33pm

A reply to jmisiewicz:

I see your point; thank you for adding it to the wish-list. Would tagging the work of each major module that has produced an object for the next module with an appropriate label, and passing that label along with the object, be a viable solution? At the end of the calculations, the set of these labels could be punched to the output so that the user has a compact record of what was done in this calculation.

Speaking of better DFOCC checks, here is another example. With ‘scf_type df’ set and OCC module being used, when a user is accidentally combining ‘gradient(olccd)’ with ‘mp_type df’, (e.g. doing OLCCD after successful OMP3), Psi4 emits (immediately after input parsing):

! OCC gradients need conventional SCF reference. !

rather than ! OLCCD gradients require setting ‘cc_type …’ !

kalju · June 15, 2023, 6:42pm

For dimers (and “folded” systems for that matter) with a significant dispersion component, I would not use B3LYP optimization as a precursor for CCSD. If you want to get an initial structure for CCSD that is better than what you can do manually, look into Psi4’s extremely fast DF MP2 module because MP2 accounts for (and maybe overbinds a bit) dispersion interactions. It has analytic gradients for the RHF reference, so this should be good option for the initial optimization of closed-shell dimers. Oh, and cc-pVDZ is not meant for dimers …

mmithun · June 16, 2023, 8:14am

Thank you very much for the suggestions.