Strange crash on Psi4

sem75 · July 31, 2017, 9:16am

Hi; I installed Psi4 using conda, and run it on AMd64. WHen I try to run it I get the following problem:

Traceback (most recent call last):
File “/home/sem/miniconda/bin/psi4”, line 248, in
exec(content)
File “”, line 56, in
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/driver/driver.py”, line 1050, in optimize
G, wfn = gradient(lowername, return_wfn=True, molecule=moleculeclone, **kwargs)
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/driver/driver.py”, line 606, in gradient
wfn = procedures[‘gradient’][lowername](lowername, molecule=molecule, **kwargs)
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/driver/procrouting/proc.py”, line 1958, in run_scf_gradient
ref_wfn = run_scf(name, **kwargs)
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/driver/procrouting/proc.py”, line 1942, in run_scf
scf_wfn = scf_helper(name, **kwargs)
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/driver/procrouting/proc.py”, line 1330, in scf_helper
e_scf = scf_wfn.compute_energy()

RuntimeError:
Fatal Error: PSIO Error
Error occurred in file: /scratch/psilocaluser/conda-builds/psi4_1495009270718/work/psi4/src/psi4/libpsio/toclen.cc on line: 105
The most recent 5 function calls were:

psi::PsiException::PsiException(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, char const*, int)
psi::PSIO::wt_toclen(unsigned int, unsigned long)
psi::PSIO::write(unsigned int, char const*, char*, unsigned long, psi::psio_address, psi::psio_address*)
psi::PSIO::write_entry(unsigned int, char const*, char*, unsigned long)
psi::Matrix::save(psi::PSIO*, unsigned int, psi::Matrix::SaveType)

Can someone help?

Thanks

loriab · July 31, 2017, 5:10pm

Did psi4 --test or python -c "import psi4; psi4.test()" (equivalent) work? It doesn’t look like you installed psi4 into a separate conda env. That’s no particular harm to psi4, but psi4 does install newer gcc packages that might interfere with other stuff you may run in your main conda env. It looks like psi4 is having trouble writing to disk. Is this a particularly large job? Did you set PSI_SCRATCH and check that it’s writable?

sem75 · August 1, 2017, 8:12am

The scratch dir is set in bashrc as:

export PSI_SCRATCH=/media/sem/6b811154-bddc-4e01-9235-b0e23fb61a03/PSI4_scratch/

it exists, however I get this error when running on 1 core

File “/home/sem/miniconda/bin/psi4”, line 158, in
import psi4
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/init.py”, line 69, in
raise Exception(“Passed in scratch is not a directory (%s).” % envvar_scratch)
Exception: Passed in scratch is not a directory (/media/sem/954f05a8-6012-403e-952b-57b1c5df53de/PSI4_scratch).

then trying on two, i get:

Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.

Traceback (most recent call last):
File “/home/sem/miniconda/bin/psi4”, line 158, in
import psi4
File “/home/sem/miniconda/lib//python2.7/site-packages/psi4/init.py”, line 69, in
raise Exception(“Passed in scratch is not a directory (%s).” % envvar_scratch)
Exception: Passed in scratch is not a directory (/media/sem/954f05a8-6012-403e-952b-57b1c5df53de/PSI4_scratch).

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[10880,1],1]
Exit code: 1

loriab · August 1, 2017, 1:57pm

Your job is running afoul of this bit. You should check that directory exists and can be written to with the necessary permissions.

Also, Psi4 won’t benefit from running under MPI. Its strength is intranode parallelism.

sem75 · August 1, 2017, 4:45pm

Thanks Lori. What should i do about “this bit” link? I am not familiar with this.

About using intranode parallelism, which command (and software) is correct to use?

Best wishes

loriab · August 1, 2017, 11:27pm

The “this bit” shows that python is not recognizing the path you exported as PSI_SCRATCH as an existing, writable directory. Therefore, you should use commands like ls -l ${PSI_SCRATCH} and echo asdf > ${PSI_SCRATCH}/testfile to make sure that dir exists and is writable. If you’re running remotely, you may want to put such commands in your queue script to make sure the indirectly run job has sufficient permissions.

Assuming Linux, not Mac, the conda binary you’re running has been compiled with OpenMP and linked to threaded BLAS libraries, so just plain executable bin/psi4 -n4 will parallelize across 4 threads.

sem75 · August 2, 2017, 11:21am

Dear Lori, something is strange her. In my bashrc I have at the end:

export PATH=/home/sem/miniconda/bin:$PATH
export PSI_SCRATCH=/media/sem/6b811154-bddc-4e01-9235-b0e23fb61a03/PSI4_scratch/

and when I issue the commands you wrote:

sem@Fjordforsk:~$ ls -l ${PSI_SCRATCH}
total 0
sem@Fjordforsk:~$ echo asdf > ${PSI_SCRATCH}/testfile
sem@Fjordforsk:~$

Nothing appears.

loriab · August 2, 2017, 2:24pm

Yes, I am saying that psi4 isn’t running because it’s hitting a code block that checks if it can write to scratch. You’ll have to use standard Linux detective skills to figure out what’s wrong with your runtime environment, and those snippets may help. There’s nothing immediately wrong with the output you posted, but maybe that dir doesn’t exist on a remote node or rwx permissions are different for the process executed through the queue.

sem75 · August 2, 2017, 3:07pm

Thanks Lori. I have changed the scratch directory to the running HD where the OS is, so now its not looking for a mounted HD in the system. Then it appears to work, using the command psi4 -n4 1.inp

How can I limit scratch size, in case it overfills the HD?

loriab · August 2, 2017, 4:31pm

Glad you found an arrangement that works. There is not a command to limit scratch size. Now that you have the software working you may want to experiment with getting scratch where your cluster sysadmin wants it. Perhaps the PSI4_scratch dir needs to be created in the queuing script before psi4 runs.

sem75 · August 2, 2017, 6:21pm

Thanks Lori. It appears that being on another HD is best. However, this HD is mounted only when file manager opens it first, and not autmatically. So we have to figure out some way to get around this.

Thanks!

All the best

Strange crash on Psi4

Primary job terminated normally, but 1 process returned a non-zero exit code… Per user-direction, the job has been aborted.

Process name: [[10880,1],1] Exit code: 1

Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.

Process name: [[10880,1],1]
Exit code: 1