How can I build an array containing a specific molecular orbital?

MaxParadiz · October 30, 2021, 4:41pm

Dear community,

I am writing some scripts to make visualizations of molecular orbitals with python/blender (example: n to pi* transition, HOMO to LUMO, s-tetrazine - YouTube).

To make these, I am first generating the cube file of the orbitals I want and then re-load them as numpy arrays. While this works just fine, I am wondering whether there is a more direct way to obtain the numpy array containing an MO without having to write a cube file first.

Thanks!
Max

ajs99778 · October 31, 2021, 4:12am

I’ll start off by saying I don’t know how to get the MO array directly from Psi4. Hopefully this is easy to do with Psi4, but I haven’t seen it in the documentation.

However, I help develop a python module called AaronTools which can read FCHK files and give you MO values. I see no benefit to doing this with AaronTools if Psi4 can just give you the array. Here is a page describing how to get an FCHK file from Psi4. Here is the installation guide for AaronTools.

The version of AaronTools on PyPI is missing the convenience functions for constructing a reasonable grid of points, but these functions are present on the version currently on GitHub. The version on PyPI is still capable of giving you the MO values for an array of points, you will just have to construct the grid yourself.

The code with the GitHub version would look something like:

from AaronTools.fileIO import FileReader
from AaronTools.geometry import Geometry

# read the FCHK file and grab orbital stuff (not just the structure)
fr = FileReader("test.fchk", just_geom=False)
# Geometry is basically a molecule object
geom = Geometry(fr)
# get the Orbitals object from the parsed data
orbitals = fr.other["orbitals"]

# number of threads that will be used when evaluating an MO
# more threads = faster
# probably best to not exceed the number of cores on your CPU
n_threads = 8

# get grid primitives with the specified resolution
# the grid is centered on the molecule's centroid
# and oriented to minimize the volume
# n_ptsN - number of points along each axis
# vN - vector along each axis
# com - origin of the cube
# u - rotation matrix for vN relative to x, y, and z
# this is not on PyPI right now
n_pts1, n_pts2, n_pts3, v1, v2, v3, com, u = orbitals.get_cube_array(
    geom,
    padding=4, # extra space in angstroms around atoms
    spacing=0.1, # target resolution in angstroms
    standard_axes=False, # use True to get vN to be x, y, and z
)
# optional - check the max. memory that will be used when
# evaluating the MO if the result higher than your computer's
# free memory, maybe don't continue
# this is not on PyPI right now
n_val = n_pts1 * n_pts2 * n_pts3
mem_used = orbitals.memory_estimate(
    "mo_value",
    n_points=n_val,
    n_atoms=len(fr.atoms),
    n_jobs=n_threads,
)
# using 8 GB as an example
if mem_used > 8:
    raise RuntimeError("insufficient memory: %.1fGB requested" % mem_used)

# get the actual points in the box
# points - array of coordinates (N_points x 3)
# n_list - number of points along each axis (some permutation
# of [n_pts1, n_pts2, n_pts3])
# this is mainly an accounting thing for writing .cube files
# sort=True is an accounting thing for writing cube file
# to get the points to increment 
# along a particular axis in a predictable order 
# sort=False means n_list will be [n_pts1, n_pts2, n_pts3]
# this is not on PyPI
points, n_list = orbitals.get_cube_points(
    n_pts1, n_pts2, n_pts3, v1, v2, v3, com, sort=False
)

# get orbital values
# this is on PyPI
# first argument is index of MO (0-indexed) or an array
# of AO coefficients
# points is the (N x 3) array
# assuming closed shell...
homo_values = orbitals.mo_value(
    orbitals.n_alpha - 1,
    points,
    n_jobs=n_threads
)
lumo_values = orbitals.mo_value(
    orbitals.n_alpha,
    points,
    n_jobs=n_threads
)

AaronTools will print an error message if it encounters a Gaussian shell it can’t handle. It should be able to work with up to i functions for pure basis sets, and I think only up to f or g shells for Cartesian functions.

jmisiewicz · October 31, 2021, 12:34pm

energy, wavefunction = energy('scf', return_wfn=True)
alpha_orbitals = wavefunction.Ca()
# Related and useful for some applications: wavefunction.Ca_subset
# alpha_orbitals is a Psi4 Matrix object, not a NumPy array.
# For a c1 wavefunction, cast the matrix to an array with:
numpy_form = alpha_orbitals.np
# Otherwise, cast the matrix to a list of arrays, one for each symmetry with
numpy_forms = alpha_orbitals.nph

The Wavefunction object is documented here. These “np” manipulations are documented here.

The technical documentation is adequate on this one. What is not adequate is the documentation that gives a high-level overview of Psi. A common theme in Psi is that information about a computation is stored on the wavefunction, but this isn’t explicitly said anywhere that I’m aware of.

MaxParadiz · October 31, 2021, 1:51pm

Thanks a lot for sharing this source code! I will look into it

MaxParadiz · October 31, 2021, 2:01pm

Thank you, I believe that I know how to proceed from here. I can make a loop to evaluate wavefunction.basisset().compute_phi(x,y,z) at each point on my grid, and then build the orbitals using the normalized Ca()'s.

I do observe some behavior that I don’t think is the intended behavior of Ca().nph. I am not sure if I am misunderstanding something, if there is an error in the code, or something is off with my compilation (version: 1.5a1.dev151).

Ca() returns a tuple containing the length of the Ca vectors and empty arrays. I need to use Ca_subset(“AO”,“ALL”) to get the arrays with coefficients.

This is a minimal example:

from psi4 import *                                                                                                                                                          
import numpy as np                                                                                                                                                          
                                                                                                                                                                            
H2 = geometry("""                                                                                                                                                           
H  0.37 0.0 0.0                                                                                                                                                             
H -0.37 0.0 0.0                                                                                                                                                             
""")                                                                                                                                                                        
                                                                                                                                                                            
E, wfn = energy('scf/STO-3G', return_wfn=True)                                                                                                                              
                                                                                                                                                                            
                                                                                                                                                                            
print(f'The output of wfn.Ca().nph is: {wfn.Ca().nph}')                                                                                                                     
print(f'The output of wfn.Ca_subset("AO","ALL").np.T is: {wfn.Ca_subset("AO","ALL").np.T}')

The output of wfn.Ca().nph is: (array([[0.77618019]]), array([], shape=(0, 0), dtype=float64), array([], shape=(0, 0), dtype=float64), array([], shape=(0, 0), dtype=float64), array([], shape=(0, 0), dtype=float64), array([], shape=(0, 0), dtype=float64), array([], shape=(0, 0), dtype=float64), array([[1.71466595]]))

The output of wfn.Ca_subset(“AO”,“ALL”).np.T is: [[ 0.54884228 0.54884228]
[ 1.21245192 -1.21245192]]

jmisiewicz · October 31, 2021, 6:15pm

The behavior is correct. There are eight irreps of H2 in its maximum symmetry subgroup, so wfn.Ca().nph returns a tuple of eight arrays. Because you’re using a minimal basis set, only two of these arrays are non-zero.

Also, wfn.Ca() is naturally in the SO basis, not the AO basis.

MaxParadiz · October 31, 2021, 7:31pm

Thank you very much!

ajs99778 · October 31, 2021, 9:22pm

Are you aware of a more performant method than BasisSet.compute_phi for evaluating the MO’s at a large number of points? Whatever cubeprop is using seems pretty fast, but looping through points and calling compute_phi on each seems rather slow.

I tested this on a relatively small molecule (methyl lactate). Calling compute_phi on a bunch of points looks like it would be much slower than just writing a cube file and then reading it.

set_num_threads(4)

from time import perf_counter
import sys

# If I don't add this to my path, I can't import
# scipy (AaronTools uses scipy)
# probably just didn't set up my conda environment right
sys.path.append("D:\\Anaconda3\\lib\\site-packages")

from AaronTools.fileIO import FileReader
from AaronTools.geometry import Geometry

import numpy as np

basis {
    assign    def2-SVP
}

molecule {
     0 1
     C       -1.90843        0.69657        0.81121
     H       -2.98168        0.52540        0.71634
     H       -1.73415        1.77169        0.80831
     H       -1.57349        0.28145        1.76161
     C       -1.17367        0.02870       -0.35323
     C        0.31301        0.35280       -0.27240
     O        0.75263        1.43131       -0.59854
     O        1.05137       -0.63964        0.22500
     C        2.46278       -0.37870        0.35374
     H        2.63281        0.46432        1.02010
     H        2.89677       -0.16269       -0.62017
     H        2.88728       -1.28690        0.76828
     O       -1.36273       -1.37995       -0.37330
     H       -2.30784       -1.55559       -0.42155
     H       -1.51202        0.47234       -1.29416
}

set {
    cubeprop_tasks = ["FRONTIER_ORBITALS"]
}

nrg, wfn = energy('B3LYP', return_wfn=True)
fchk_writer = psi4.core.FCHKWriter(wfn)
fchk_writer.write('test2.fchk')

# letting Psi4 write cube files
cube_start = perf_counter()
cubeprop(wfn)
cube_stop = perf_counter()
print("writing cubes took %.2fs" % (cube_stop - cube_start))



# get points on a grid using AaronTools
n_threads = 4
fr = FileReader("test2.fchk", just_geom=False)
geom = Geometry(fr)
orbitals = fr.other["orbitals"]

n_pts1, n_pts2, n_pts3, v1, v2, v3, com, u = orbitals.get_cube_array(
    geom,
    padding=4,
    spacing=0.1,
    standard_axes=False,
)

n_val = n_pts1 * n_pts2 * n_pts3
mem_used = orbitals.memory_estimate(
    "mo_value",
    n_points=n_val,
    n_atoms=len(fr.atoms),
    n_jobs=n_threads,
)
if mem_used > 8:
    raise RuntimeError("insufficient memory: %.1fGB requested" % mem_used)

print("AaronTools will use about %.1fGB" % mem_used)

points, n_list = orbitals.get_cube_points(
    n_pts1, n_pts2, n_pts3, v1, v2, v3, com, sort=False
)

print("will calculate MO's on a grid with %i points" % n_val)

# using AaronTools to get the MO's at these points
at_start = perf_counter()
homo_values = orbitals.mo_value(
    orbitals.n_alpha - 1,
    points,
    n_jobs=n_threads
)
lumo_values = orbitals.mo_value(
    orbitals.n_alpha,
    points,
    n_jobs=n_threads
)
at_stop = perf_counter()
print("AaronTools took %.2fs to get homo and lumo" % (at_stop - at_start))




# using Psi4's BasisSet to get the MO's
basis = wfn.basisset()
homo_vals = np.zeros(len(points))
lumo_vals = np.zeros(len(points))
coeff = wfn.Ca_subset("AO", "ALL").np.T
homo_coeff = coeff[wfn.nalpha() - 1]
lumo_coeff = coeff[wfn.nalpha()]
psi_start = perf_counter()
for i, pt in enumerate(points):
    # points might not have the units compute_phi
    # expects (Angstrom vs Bohr), but this still does the
    # same number of points as AaronTools for comparison
    ao_vals = basis.compute_phi(*pt)
    homo_vals[i] = np.dot(ao_vals, homo_coeff)
    lumo_vals[i] = np.dot(ao_vals, lumo_coeff)

psi_stop = perf_counter()
print("Psi4 took %.2fs to get homo and lumo" % (psi_stop - psi_start))

The output:

writing cubes took 0.87s
AaronTools will use about 0.4GB
will calculate MO's on a grid with 1782804 points
AaronTools took 6.53s to get homo and lumo
Psi4 took 288.66s to get homo and lumo

I ran this a couple of times and these results seem pretty typical. I will note that the cube files generated by cubeprop contain about 3.5x fewer points than I threw at compute_phi, though setting a higher resolution for the cube files doesn’t seem to slow cubeprop down that much.

If a user is doing this for moderate/large molecules with good resolution, compute_phi may be unnecessarily slow. Psi4 can get these values more quickly, as cubeprop shows.

hokru · November 1, 2021, 12:19pm

The exported C++ compute_phi function to python allocates a vector everytime it is called: psi4/export_mints.cc at master · psi4/psi4 · GitHub
Maybe this is the main speed bump?

system · December 31, 2021, 12:20pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.