Thank you so much for the guidance. Here are a couple of “interesting” things that have transpired:
a) The recent Psi4 Git version (as well as the older 1.7) builds well with CMake 3.29.8, GCC (ver. 12.4 or 13.3), Python 3.11.10 and OpenBLAS 0.3.28 as long as INTERFACE64=OFF when building OpenBLAS. All tests pass with the recent Git version on znver3 (AMD Milan). However:
CMake Warning at src/psi4/libqt/CMakeLists.txt
Your BLAS/LAPACK library does not seem to be providing the DGGSVD3 and
DGGSVP3 subroutines. No re-routing is available.
is somewhat concerning. Is OpenBLAS not including these LAPACK subroutines? This seems to be not consistent with what I gather from #2832.
b) The recent Git version of Psi4 fails at the configuration stage on AMD EPYC systems with icpx compiler from 2024 OneAPI. The issue seems to be that this compiler returns “4.2.1” for “GNUC, GNUC_MINOR, GNUC_PATCHLEVEL”, which I think is its intrinsic mode that has not much to do with whatever GCC compiler is actually installed. The current logic in custom_cxxstandard.cmake decides that 4.2.1 is less than 4.9 and yields a FATAL_ERROR. When I commented out this check, I can build with Intel 2024 compilers. Compilation with Intel Classic 2019 icc works out of the box, which reports back the correct GCC compiler:
Found base compiler version 12.4.0
versus icpx:
– Found base compiler version
Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, AVX, F16C, FMA, BMI, LZCNT, AVX2 and ADX instructions.
c) The recent introduction of eigen3 headers into Psi4 code gives me trouble when I have Eigen3 installed at a custom location. I think that CMake passes the Eigen3_DIR variable only to Libint’s external build but not to Psi4. In any case, because I could not figure out the correct include directive for CMake, I had to hard-code the proper include path to
psi4/src/psi4/libfock/SplitJK.h
and
psi4/src/psi4/libmints/matrix.h
d) Requesting
MAX_AM=5 builds Libint 5-4-3-6-5-4 … kind of expected that
while
MAX_AM=6 builds Libint2 7-7-4-12-7-5 … while one might have expected 6-5-4-7-6-5
Not complaining, but just surprised
I will share a bit more about relative performance and scaling later but in general, OLCCD scaling in znver1 (AMD Threadripper), znver2 (AMD Rome), and znver3 (AMD Milan) looks rather similar, and there is indeed not much to be gained by going beyond 8 threads.