Suggestions for HPC installation?

Dear Psi4 experts,

I have used Psi4 for quite some time but always on a single system with multi-threading. I am quite comfortable installing Psi4 from the source under Linux, and compiling Libint with higher angular momentum than what the conda packages currently offer.

I am now planning to install Psi4 on a few HPC systems, and I am wondering if there are any special considerations I should be aware of. The target systems are AMD EPYC (Rome) or Intel Xeon based systems. I anticipate running jobs via DFOCC, FNOCC, and DETCI modules.

In particular:

a) For AMD EPYC system, shall I use OpenBLAS, AOCL-BLIS, or MKL for linear algebra when running jobs using up to 64 threads on a single CPU?

b) What is the current state of MPI support in Psi4? Any particular MPI implementation that are recommended?

c) Is there anything special about Libint compilation to optimally support MPI parallelization?

d) Apart from a now decade-old DePrice’s DF-CCSD and BrianQ’ SCF/DFT implementations, is there anything else in Psi4 that can benefit from GPU?

d) Anything else I might have not considered?

Thanks

Thanks for your questions. Here’s a few answers. I don’t know first-hand of any large-scale AMD HPC installations, so let us know If you find anything interesting. Or if you have further questions.

(a) I know Psi4 runs correctly with MKL, OpenBLAS (make sure you use the openmp, not the pthreads, variant), and Accelerate (Mac). It’s probably not going to scale well beyond ~10 threads. Last time (~5yo) we tried MKL on an AMD, the timings weren’t great compared to an Intel chip; fnocc was part of that test. But that was a while ago.

(b) There is no MPI in Psi4 (there have been modules in the past with MPI, but those were removed before v1.0, iirc).

(c) Libint itself has neither MPI nor OpenMP directives, so no need for special compilations. Parallelism in integrals happens at the Psi4 layer.

(d) Yes! David Poole and David Williams-Young have worked to get the latter’s GauXC working in Psi4 for sn-LinK SCF. There’s a recent JCP paper. The PR is at https://github.com/psi4/psi4/pull/3150 ; merging only awaits some testing fixes.