Concurrent calculations using the python API

GHill · August 23, 2018, 11:47am

I’d like to be able to run concurrent psi4 calculations via a python script (reading in molecules from a database and computing the energy for each molecule). I thought I might be able to easily do this using concurrent.futures and a simple function that runs psi4, along the lines of this snippet:

executor.map(runjob, molecules)

where runjob runs a simple SCF calculation and you can probably take a good guess as to what molecules is. If I just run this sequentially (with map rather than executor.map) I get the results I expect. When I run this concurrently a lot of errors of the following type result:

PSIO_ERROR: Attempt to write into next entry: 35, SO-basis Overlap Ints

This suggests to me that the same scratch file is being used for all of the jobs (or some other conflict between the jobs). Is there a way to specify separate scratch files via the API, or some other process safe way of running psi4 in this fashion?

dgasmith · August 23, 2018, 10:35pm

Yes, the file IO still uses PID as the basis for the IO string, there may be other globals still floating around that could give you issues as well. Anything that runs processes under different PID’s will work. For example dask.distributed:

import psi4
from distributed import Client

def compute(x):
    mol = psi4.geometry("""
    He
    He 1 *R*
    """.replace("*R*", str(x)))

    return psi4.energy("MP2/cc-pVDZ", molecule=mol)


c = Client("192.168.1.6:8786")
ret = c.map(compute, [2, 3, 4, 5, 6])
print([x.result() for x in ret])

MolSSI is building the QCArchive project which will support this functionality out of the box. See here for an overview.

GHill · August 24, 2018, 12:34pm

dask.distributed seems to do exactly what I was looking for - thanks!

I’ll keep an eye on the QCArchive project too.

GHill · August 29, 2018, 1:27pm

This might help someone in the future, so a note that will (hopefully) be updated in the future.

Using dask.distributed as installed via conda didn’t work for me:

RuntimeError: The current Numpy installation (‘/Users/jgh/psi4conda/lib/python3.6/site-packages/numpy/init.py’) fails to pass simple sanity checks. This can be caused for example by incorrect BLAS library being linked in.

I’ve not yet found a solution for this, but things to try include building dask.distributed from source and investigating other distributed libs.

GHill · September 21, 2018, 12:45pm

And another follow up…

The numpy/dask problem was related to a Mac where the Python environment is, err, complicated (see this xkcd). I didn’t manage to resolve that, but it worked just fine on a Linux machine with a fresh conda env.

If anyone cares about the macOS issue, I suspect that numpy was originally linked against mkl, but was somehow picking up the accelerate framework when it tried to do any linear algebra.