I tried to search this forum but was still very confused. My question is, is this the expected behavior? With 4 threads, I get speed up only by 2.2 times. With 8 threads, only 2.6.
I am using conda-installed psi4.
I used psi4/share/psi4/scripts/test_threading.py to test the speed and here is what I got.
I found a very old post Multithreading in downloaded binary distribution, but it was for version 1.3. Not sure if things changed in 1.9.
Time for threads 1, size 200: Psi4: 0.000388 NumPy: 0.000405
Time for threads 1, size 500: Psi4: 0.005445 NumPy: 0.005349
Time for threads 1, size 2000: Psi4: 0.318029 NumPy: 0.308266
Time for threads 1, size 4000: Psi4: 2.529851 NumPy: 2.515574
Time for threads 4, size 200: Psi4: 0.000124 NumPy: 0.000144
Time for threads 4, size 500: Psi4: 0.001536 NumPy: 0.001613
Time for threads 4, size 2000: Psi4: 0.093142 NumPy: 0.090224
Time for threads 4, size 4000: Psi4: 0.670489 NumPy: 0.667352
NumPy@n4 : Psi4@n4 ratio (want ~1): 1.00
Psi4@n1 : Psi4@n4 ratio (want ~4): 3.77
Running psi4 -i _thread_test_input_psi4_yo.in -o _thread_test_input_psi4_yo_n1.out -n1 …
Time for threads 1: Psi4: 85.772461
Running psi4 -i _thread_test_input_psi4_yo.in -o _thread_test_input_psi4_yo_n4.out -n4 …
Time for threads 4: Psi4: 38.372297
Psi4@n1 : Psi4@n4 ratio (want ~4): 2.24
Time for threads 1, size 200: Psi4: 0.000425 NumPy: 0.000443
Time for threads 1, size 500: Psi4: 0.005931 NumPy: 0.005959
Time for threads 1, size 2000: Psi4: 0.359246 NumPy: 0.348622
Time for threads 1, size 4000: Psi4: 2.764390 NumPy: 2.720615
Time for threads 8, size 200: Psi4: 0.000081 NumPy: 0.000106
Time for threads 8, size 500: Psi4: 0.000886 NumPy: 0.000946
Time for threads 8, size 2000: Psi4: 0.062190 NumPy: 0.052501
Time for threads 8, size 4000: Psi4: 0.378442 NumPy: 0.377958
NumPy@n8 : Psi4@n8 ratio (want ~1): 1.00
Psi4@n1 : Psi4@n8 ratio (want ~8): 7.30
Running psi4 -i _thread_test_input_psi4_yo.in -o _thread_test_input_psi4_yo_n1.out -n1 …
Time for threads 1: Psi4: 89.553097
Running psi4 -i _thread_test_input_psi4_yo.in -o _thread_test_input_psi4_yo_n8.out -n8 …
Time for threads 8: Psi4: 34.369661
Psi4@n1 : Psi4@n8 ratio (want ~8): 2.61