Efficiency considerations in quantum chemistry are usually so overwhelming that methods are implemented with calls to BLAS/LAPACK in a compiled language. Moreover, QC packages usually beg to be linked against an optimized BLAS (e.g., MKL). When arrays fit in memory, NumPy dot is reasonably equivalent, provided numpy in turn is linked to a good optimized blas (like the numpy from Anaconda Py distribution). (You’ll see max memory set in a lot of psi4numpy scripts, so that they are safe to run in a laptop’s memory.) That’s all when the best matrix multiply pattern is already worked out and coded into dgemm, etc. In QC, it’s usually worth a person figuring out a good pattern, unless the number of terms is outrageous. For runtime optimization of tensor contractions, you might find the opt_einsum project of interest. It’s a drop-in replacement to einsum.