Efficiency of einsum vs BLAS

I am excited about the NumPy psi4 project
Just wondering about the tensor contractions. It seems that it was evaluated by einsum in NumPy, e.g., https://github.com/psi4/psi4numpy/blob/master/Coupled-Cluster/Spin_Orbitals/CCSD/CCSD_T.py. One may do so by a series of BLAS in a lower-level language, e.g., C++/Fortran.

Is there any comparison for the efficiency of quantum chemical calculations? I saw some benchmark on https://stackoverflow.com/questions/7596612/benchmarking-python-vs-c-using-blas-and-numpy
the results seem to vary a lot.

Efficiency considerations in quantum chemistry are usually so overwhelming that methods are implemented with calls to BLAS/LAPACK in a compiled language. Moreover, QC packages usually beg to be linked against an optimized BLAS (e.g., MKL). When arrays fit in memory, NumPy dot is reasonably equivalent, provided numpy in turn is linked to a good optimized blas (like the numpy from Anaconda Py distribution). (You’ll see max memory set in a lot of psi4numpy scripts, so that they are safe to run in a laptop’s memory.) That’s all when the best matrix multiply pattern is already worked out and coded into dgemm, etc. In QC, it’s usually worth a person figuring out a good pattern, unless the number of terms is outrageous. For runtime optimization of tensor contractions, you might find the opt_einsum project of interest. It’s a drop-in replacement to einsum.

I’m marking the topic as solved. Feel free to re-ask if you think the question isn’t answered in full.