Dear Psi4 experts,
I have used Psi4 for quite some time but always on a single system with multi-threading. I am quite comfortable installing Psi4 from the source under Linux, and compiling Libint with higher angular momentum than what the conda packages currently offer.
I am now planning to install Psi4 on a few HPC systems, and I am wondering if there are any special considerations I should be aware of. The target systems are AMD EPYC (Rome) or Intel Xeon based systems. I anticipate running jobs via DFOCC, FNOCC, and DETCI modules.
In particular:
a) For AMD EPYC system, shall I use OpenBLAS, AOCL-BLIS, or MKL for linear algebra when running jobs using up to 64 threads on a single CPU?
b) What is the current state of MPI support in Psi4? Any particular MPI implementation that are recommended?
c) Is there anything special about Libint compilation to optimally support MPI parallelization?
d) Apart from a now decade-old DePrice’s DF-CCSD and BrianQ’ SCF/DFT implementations, is there anything else in Psi4 that can benefit from GPU?
d) Anything else I might have not considered?
Thanks