How to optimize the execution of multiple energy calculations in a job submission?

Dear friends,

I don’t know if my question is appropriate for the forum, but here goes.

I would like to run multiple interaction energy calculations in psi4 software, like the script below for butane.

memory 16 GB
molecule butane_dimer {
   C -0.000027000 0.119189000 -1.940263000
   H 0.879140000 0.747513000 -2.076177000
   H -0.879237000 0.747467000 -2.076120000
   H -0.000035000 -0.630774000 -2.728571000
   C 0.000032000 -0.516268000 -0.558441000
   H -0.874049000 -1.161258000 -0.447947000
   H 0.874151000 -1.161219000 -0.447988000
   C 0.000032000 0.516266000 0.558443000
   H -0.874045000 1.161262000 0.447943000
   H 0.874155000 1.161210000 0.447995000
   C -0.000028000 -0.119188000 1.940262000
   H 0.879131000 -0.747527000 2.076171000
   H -0.000015000 0.630781000 2.728560000
   H -0.879249000 -0.747450000 2.076129000
   --
   C -0.000027000 10.119189000 -1.940263000
   H 0.879140000 10.747513000 -2.076177000
   H -0.879237000 10.747467000 -2.076120000
   H -0.000035000 9.369226000 -2.728571000
   C 0.000032000 9.483732000 -0.558441000
   H -0.874049000 8.838742000 -0.447947000
   H 0.874151000 8.838781000 -0.447988000
   C 0.000032000 10.516266000 0.558443000
   H -0.874045000 11.161262000 0.447943000
   H 0.874155000 11.161210000 0.447995000
   C -0.000028000 9.880812000 1.940262000
   H 0.879131000 9.252473000 2.076171000
   H -0.000015000 10.630781000 2.728560000
   H -0.879249000 9.252550000 2.076129000
}
set basis aug-cc-pVTZ
E = energy('MP2',molecule=butane_dimer, bsse_type='cp')
Efinal = E* psi_hartree2kcalmol
psi4.print_out("MP2/aug-cc-pVTZ = ")
psi4.print_out("%10.6f" % (Efinal))

In the cluster I use, however, I can only submit jobs requesting a fixed amount of processors (minimum 12).

Is there a way to optimize the use of these 12 processors to run multiple energy calculations at the same time? In a python script perhaps?

The cluster I use operates with the pbs queuing system, what I was doing for 12 processors is calling 12x the psi4. But this procedure proved to be quite inefficient. Not forgetting to mention that the intermediate files are quite large.

(psi4 -i butane_1_1.dat -o butane_1_1.out -n 1) &
(psi4 -i butane_1_2.dat -o butane_1_2.out -n 1) &
(psi4 -i butane_1_3.dat -o butane_1_3.out -n 1) &
(psi4 -i butane_1_4.dat -o butane_1_4.out -n 1) &
(psi4 -i butane_1_5.dat -o butane_1_5.out -n 1) &
(psi4 -i butane_1_6.dat -o butane_1_6.out -n 1) &
(psi4 -i butane_1_7.dat -o butane_1_7.out -n 1) &
(psi4 -i butane_1_8.dat -o butane_1_8.out -n 1) &
(psi4 -i butane_1_9.dat -o butane_1_9.out -n 1) &
(psi4 -i butane_1_10.dat -o butane_1_10.out -n 1) &
(psi4 -i butane_1_11.dat -o butane_1_11.out -n 1) &
(psi4 -i butane_1_12.dat -o butane_1_12.out -n 1) &
wait

I know my question is not psi4 specific, but if anyone has experience with this type of problem and can give me some insight, I would appreciate it.

Thanks in advance.

In case someone has the same problem, I got an improvement in the performance of the calculations using the link command in each psi4 input.

https://psicode.org/psi4manual/master/api/psi4.core.set_num_threads.html

Best regards,

Why don’t you simply use all 12 cores in your Psi4 calculation?

I wanted to optimize the calculation time, given that I have to calculate the interaction energy of thousands of conformers. For my specific case, running 1 job with 3 inputs x 4 cores proved to be more efficient than running 3 jobs with 1 input x 12 cores.