I’m completely new to Psi4 and have only just installed it today, because I learned that it was used to generate the MD17 dataset, which I am interested in.
I’m currently starting up a neural network approach to molecular dynamics and for that I need a dataset. The ideal dataset for my research is essentially the MD17 dataset found here: However, there is a problem with this dataset for my use-case, as quoted in the originating article, the MD17 dataset is created as:
Reference data generation. The data used for training the DFT models were
created running abinitio MD in the NVT ensemble using the Nosé-Hoover ther-
mostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. We computed
forces and energies using all-electrons at the generalized gradient approximation
level of theory with the Perdew-Burke-Ernzerhof (PBE) 65 exchange-correlation
functional, treating van der Waals interactions with the Tkatchenko-Scheffler (TS)
method 66 . All calculations were performed with FHI-aims 67 . The final training data
was generated by subsampling the full trajectory under preservation of the Maxwell-
Boltzmann distribution for the energies.
To create the coupled cluster datasets, we reused the same geometries as for the
DFT models and recomputed energies and forces using all-electron coupled cluster
with single, double, and perturbative triple excitations (CCSD(T)). The Dunning’s
correlation-consistent basis set cc-pVTZ was used for ethanol, cc-pVDZ for toluene
and malonaldehyde and CCSD/cc-pVDZ for aspirin. All calculations were
performed with the Psi4 68 software suite.
So the data has been subsampled, meaning that the datapoints in the MD17 dataset do not have the same time-step size between two following data samples, which is needed for my work.
So my question are:
Is there anyway of generating this dataset again given the above information? I have tried contacted the author, but haven’t heard anything back yet.
Or alternatively, are there any other simple systems like this available online or does anyone have any scripts/tutorial for how to generate a molecular system dataset.
What I need are the atomic positions at each step, and ideally I would like the atomic velocities and Force vectors as well if possible. I would like to generate at least 100k-500k time-steps since I need quite a lot of data for the neural network training.
Any insight from experienced psi4 users or people in the field of molecular dynamics would be greatly appreciated.