I’m trying to parallelize Psi4 across many nodes and cores on our cluster. Thus far I have used the psi4.set_num_threads() but am quite confused about if I’m actually running on all possible threads.
I am running Psi4 in a PBS batch job (ie. #PBS -l nodes=4:ppn=5) and then try to pass in the number of threads using essentially: psi4.set_num_threads(multiprocessing.cpu_count()). Is this the correct way to do this to ensure I use all of the cores and threads that I have access to?
Additionally, it looks like when I don’t include the set_num_threads line psi4 runs more quickly. What am I missing?
PSI4 currently has no inter-node communication capabilities. It currently only uses OpenMP for parallel execution. Make sure to not oversubscribe a node with more threads than cores available.
Thanks for the reply! It’s good to know that psi4 does not paralellize across nodes.
But if I want to parallelize across multiple processors on a single node would I need to run python with mpirun (i.e. mpirun -np 5 python psi4_run.py)? As you say I may be over parallelizing already, but if I wanted to ensure I was using all the threads I have available is the way I had above (using multiprocessing) the best way to do it?
@hokru is right that threaded BLAS (MKL strongly recommended; this is baked into the binary conda package) and OpenMP are the only parallelism psi4 responds to. If you’re running psithon (molecule {...}), then I’d avoid the psi4.set_num_threads(n) in favor of psi4 -n. If you’re running psiapi (psi4.geometry("""...""")), then continue with set_num_threads.