I’m running CCSD and CCSD(T) geometry optimizations and, although I have allocated several CPUs to each calculation (via set_num_threads(#CPUs)) and a ton of memory, the jobs progress very slowly and seem to spend little or no time in multi-thread mode.
I am running version 1.3.2, which was installed via Conda under anaconda3.
Yes they are threaded. To various degrees. The conventional (T) gradients, while understanding symmetry are only threaded for BLAS operations and are not fully optimized for speed yet (as far as I understand). The energy calculations are, but not the gradients. Also the integral generation (after the SCF) can slow down the calculation considerably!
The DF-CCSD(T) gradients in PSI4 are faster, better threaded and understand frozen core. Highly recommend them for larger calculations. However, you have no symmetry and only RHF.
Alternatively, the running CFOUR through the PSI4 interface is nice and simple.