GPU_DFCC not running due to error

Hi all,

I am interested in running the dfcc plugin on GPU but I am unable to get my installation to work. I created a brand new conda environment with the following commands:

conda create -n psi4gpu  python=3.6
conda activate psi4gpu
conda install -c openeye -c psi4 matplotlib scipy openeye-toolkits psi4 dftd3 gcp gpu_dfcc

Then I tried to test Psi4 on this sample input file using the command psi4 input.dat -n 2.

My terminal then shows error: ERROR: cpu tmp: invalid argument

The bottom of my output file shows:

 => Auxiliary Basis Set <=

  Basis Set: (AUG-CC-PVDZ AUX)
    Blend: AUG-CC-PVDZ-RI
    Number of shells: 40
    Number of basis function: 118
    Number of Cartesian functions: 136
    Spherical Harmonics?: true
    Max angular momentum: 3

    Number of auxiliary functions:         118


  _________________________________________________________
  CUDA device properties:
  name:                  GeForce GTX TITAN X
  major version:                           5
  minor version:                           2
  canMapHostMemory:                        1
  totalGlobalMem:                      12212 mb
  sharedMemPerBlock:                   49152
  clockRate:                           1.076 ghz
  regsPerBlock:                        65536
  warpSize:                               32
  maxThreadsPerBlock:                   1024
  _________________________________________________________


  allocating gpu memory...

I imagine it has something to do this line in the code. Do you have any suggestions on how to proceed?

This sounds like a question for @deprince.

Also, if you’re familiar with building software for the gpu and have the nvcc compiler toolchain set up on your computer, it may be worth building it yourself and using your own runtime libs. Sample compile line here. I don’t get much feedback on the gpudfcc plugin and can’t test it myself.

1 Like

This error means something went wrong when trying to allocate host memory two lines above the one you referenced. Without more information, I can’t tell what exactly went wrong.

If I recall correctly, the default behavior will be that the code will try to allocate as much CPU memory as there is global memory on the GPU. How much RAM do you have available on the CPU?

Thanks for the input @deprince. It turned out to be a memory error after all. I tried an interactive job on SLURM (srun). Specifying different parameters for --mem led to the various results:

  • --mem 6000 ERROR: cpu tmp: invalid argument
  • --mem 12000 Killed
  • --mem 15000 CCSD total energy…PASSED

This is my output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  On   | 00000000:04:00.0 Off |                  N/A |
| 22%   35C    P8    15W / 250W |      0MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

OK, I’m glad to hear that it works with more memory. If you want more control over the the CPU memory footprint, you can use the input parameter

set MAX_MAPPED_MEMORY XXX

where XXX is given in MB. Note that some of the tiles passed to and manipulated by the GPU will be smaller in this case, though, which may affect performance.

1 Like