Those are some interesting results. CPU time > Wall time suggests that either multithreading or hyperthreading occured during the calculation.
It probably wouldn’t hurt to make sure that Psi4 only ran on 1 core (no multithreading) even though you specified this. Do you still have the output file generated by Psi4? You can run
grep -i "Threads" to make sure this is the case.
My best guess is that the CPU you run on supports hyperthreading. You said that the node is configured to run one thread per core. Does this refer to physical or virtual cores? It’s possible that the job runs on 1 physical core which is treated by the operating system as 2 virtual cores. This would be in line with your observation that user times are approximately double the length of wall times.
You could check if your CPU has enabled hyperthreading. On my computer (linux / Intel CPU) I can do this with
"grep -i 'ht' /proc/cpuinfo". If you’re submitting jobs to a cluster you might have to ask the sysadmin about it.
Your second observation about the difference between the timer-file and the output-file is a little puzzling. It makes sense that the wall time is slightly shorter when subroutines are added, since not every operation within Psi4 is explicitly timed and displayed in the timer-file. However, the difference between the two user/system times suggests that he timer-file accounts for hyperthreading while the output-file doesn’t. My suspicion is that the timer-file is the more accurate of the two, but hopefully a developer with a little more knowledge about how timing works in Psi4 can chime in.