I’m trying to parallelize Psi4 across many nodes and cores on our cluster. Thus far I have used the psi4.set_num_threads() but am quite confused about if I’m actually running on all possible threads.
I am running Psi4 in a PBS batch job (ie. #PBS -l nodes=4:ppn=5) and then try to pass in the number of threads using essentially: psi4.set_num_threads(multiprocessing.cpu_count()). Is this the correct way to do this to ensure I use all of the cores and threads that I have access to?
Additionally, it looks like when I don’t include the set_num_threads line psi4 runs more quickly. What am I missing?