Reduce size of Docker image

Hi, I have built a Docker image to run psi4 using Ubuntu 16.04, it works fine but it is pretty large (1.5GB).
See my Dockerfile https://github.com/paesanilab/docker-images/blob/master/psi4conda/Dockerfile

  • Do you happen to have a smaller image already available?
  • Or do you have suggestions to reduce it size? for example I had to install build-essential otherwise the conda gcc package wouldn’t install.

Thanks!

Hi, I haven’t played around with Docker myself, so I don’t have a smaller image. However the conda packages are changing quite a bit since psi4conda v1.1 with all the new conda build tools. I don’t have a psi4conda installer ready. But would you want to try docker from a Miniconda + conda create -n p4env psi4 ... line that I can give you?

1 Like

sure! I had a first attempt here but had some issues installing gcc from Anaconda, but I should be able to fix it. https://github.com/paesanilab/docker-images/blob/master/psi4/Dockerfile

Can you please check that the conda line is fine (or send yours)?

So you don’t want to be using plain -c psi4 (aka -c psi4/label/main) channel b/c that’s circa the last 1.1 release from May 2017. The next step beyond that is -c psi4/label/dev which is reputedly nightly build from master. That downloads full gcc 5.2 compilers (I’ll skip why unless you’re interested). There’s a big gap in dev subchannel when psi and conda weren’t in sync on build tools. The most modern psi is actually in -c psi4/label/agg and that should bring along only gcc 7.2 libraries, not compilers, so that may save a little space and linux fundamentals. I’m about to start migrating packages from agg to dev. But what you’re after is something like conda create -n p4env psi4 -c psi4/label/agg -c psi4 . Because it’s just been me on that channel, not even the core devs, no instructions are battle hardened, so unless it’s obvious, ping me when something goes wrong.

Lori, can we build with OpenBLAS and GCC. Should save 600MB+

@dgasmith if this is something I can do in the Dockerfile, if you send me details I can try it out, thanks!

thanks @loriab, I’ll test it and report back

@loriab, I tested the agg branch, good news is that it installs and works fine without installing any other package.
However the output container is still big, it is 1.58 GB.

The base image miniconda3 is just 409 MB, then conda installs:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mkl_random-1.0.1           |   py36h629b387_0         373 KB
    libefp-1.5b2               |       h470d631_0         4.2 MB  psi4/label/agg
    psi4-1.2a1.dev999+7086126  |   py36h4680546_0        24.8 MB  psi4/label/agg
    numpy-1.14.2               |   py36hdbf6ddf_1         4.1 MB
    mkl-2018.0.2               |                1       205.2 MB
    gau2grid-0.1               |                3         735 KB  psi4/label/agg
    certifi-2018.4.16          |           py36_0         142 KB
    py-1.5.3                   |           py36_0         135 KB
    libxc-3.0.0                |       h5eb71ee_3         814 KB  psi4/label/agg
    pytest-3.5.1               |           py36_0         292 KB
    more-itertools-4.1.0       |           py36_0          76 KB
    libgfortran-ng-7.2.0       |       hdf63c60_3         1.2 MB
    mkl_fft-1.0.1              |   py36h3010b51_0         140 KB
    gdma-2.2.6                 |       hf4d0741_5         763 KB  psi4/label/agg
    libint-1.2.1               |       h73b9bb0_3        23.9 MB  psi4/label/agg
    intel-openmp-2018.0.0      |                8         620 KB
    attrs-17.4.0               |           py36_0          41 KB
    openssl-1.0.2o             |       h20670df_0         3.4 MB
    pluggy-0.6.0               |   py36hb689045_0          23 KB
    ca-certificates-2018.03.07 |                0         124 KB
    ------------------------------------------------------------
                                           Total:       271.0 MB
The following NEW packages will be INSTALLED:

    attrs:           17.4.0-py36_0                                     
    gau2grid:        0.1-3                               psi4/label/agg
    gdma:            2.2.6-hf4d0741_5                    psi4/label/agg
    intel-openmp:    2018.0.0-8                                        
    libefp:          1.5b2-h470d631_0                    psi4/label/agg
    libgfortran-ng:  7.2.0-hdf63c60_3                                  
    libint:          1.2.1-h73b9bb0_3                    psi4/label/agg
    libxc:           3.0.0-h5eb71ee_3                    psi4/label/agg
    mkl:             2018.0.2-1                                        
    mkl_fft:         1.0.1-py36h3010b51_0                              
    mkl_random:      1.0.1-py36h629b387_0                              
    more-itertools:  4.1.0-py36_0                                      
    numpy:           1.14.2-py36hdbf6ddf_1                             
    pluggy:          0.6.0-py36hb689045_0                              
    psi4:            1.2a1.dev999+7086126-py36h4680546_0 psi4/label/agg
    py:              1.5.3-py36_0                                      
    pytest:          3.5.1-py36_0                                      

The following packages will be UPDATED:

    ca-certificates: 2017.08.26-h1d4fec5_0                              --> 2018.03.07-0     
    certifi:         2018.1.18-py36_0                                   --> 2018.4.16-py36_0 
    openssl:         1.0.2n-hb7f436b_0                                  --> 1.0.2o-h20670df_0

however once they are uncompressed they are a lot larger, the /opt/conda/lib folder is 1.2GB, mostly due to mkl (the total of the 38 *mkl* files is 771 MB).

I also created an auto-build on Dockerhub: https://hub.docker.com/r/paesanilab/psi4/

@zonca The process would be to pull down a special conda base and compile the code in-place. This is a bit tricky as you need to pull the same BLAS and OpenMP as the other Python packages used. I think @loriab may have a few magical lines to do this laying around.

To use openblas, you’d have to grab an openblas numpy too (at least mkl is getting multiple uses now) and rebuild any addons with openblas, too. With docker you can build software up, then delete files before forming the final package, right? I bet one could cut out a good swath of mkl and keep only the needed files. On the surface, psi4’s just using libmkl_rt.so. So thread, core, and lp64 can go. Here’s the list for mkl-2018.0.2-1. Do you want to give pruning a try or the openblas route? I’d just rather see psi4+mkl used for production runs.

lib/libmkl_ao_worker.so
lib/libmkl_avx.so
lib/libmkl_avx2.so
lib/libmkl_avx512.so
lib/libmkl_avx512_mic.so
lib/libmkl_blacs_intelmpi_ilp64.so
lib/libmkl_blacs_intelmpi_lp64.so
lib/libmkl_blacs_openmpi_ilp64.so
lib/libmkl_blacs_openmpi_lp64.so
lib/libmkl_blacs_sgimpt_ilp64.so
lib/libmkl_blacs_sgimpt_lp64.so
lib/libmkl_cdft_core.so
lib/libmkl_core.so
lib/libmkl_def.so
lib/libmkl_gf_ilp64.so
lib/libmkl_gf_lp64.so
lib/libmkl_gnu_thread.so
lib/libmkl_intel_ilp64.so
lib/libmkl_intel_lp64.so
lib/libmkl_intel_thread.so
lib/libmkl_mc.so
lib/libmkl_mc3.so
lib/libmkl_pgi_thread.so
lib/libmkl_rt.so
lib/libmkl_scalapack_ilp64.so
lib/libmkl_scalapack_lp64.so
lib/libmkl_sequential.so
lib/libmkl_tbb_thread.so
lib/libmkl_vml_avx.so
lib/libmkl_vml_avx2.so
lib/libmkl_vml_avx512.so
lib/libmkl_vml_avx512_mic.so
lib/libmkl_vml_cmpt.so
lib/libmkl_vml_def.so
lib/libmkl_vml_mc.so
lib/libmkl_vml_mc2.so
lib/libmkl_vml_mc3.so
lib/mkl_msg.cat

Thanks,
I tried pruning MKL independently for lp64, thread and core, but all give an error similar to this:

Intel MKL FATAL ERROR: Cannot load libmkl_core.so.

After some attempts, I got down to 169M and my test case is still running:

65M     libmkl_core.so
9.4M    libmkl_intel_ilp64.so
11M     libmkl_intel_lp64.so
37M     libmkl_intel_thread.so
43M     libmkl_mc3.so
5.8M    libmkl_rt.so
112K    mkl_msg.cat
169M    total

pretty good gain, the image is now 934 MB, would you have any suggestion to go lower?
ideally 600-700MB would be fine

Can you give a conda list we may be able to eject a number of addons that you may not use.

thanks, here is conda list:

asn1crypto                0.24.0                   py36_0  
attrs                     17.4.0                   py36_0  
ca-certificates           2018.03.07                    0  
certifi                   2018.4.16                py36_0  
cffi                      1.11.4           py36h9745a5d_0  
chardet                   3.0.4            py36h0f667ec_1  
conda                     4.4.10                   py36_0  
conda-env                 2.6.0                h36134e3_1  
cryptography              2.1.4            py36hd09be54_0  
gau2grid                  0.1                           3    psi4/label/agg
gdma                      2.2.6                hf4d0741_5    psi4/label/agg
idna                      2.6              py36h82fb2a8_1  
intel-openmp              2018.0.0                      8  
libedit                   3.1                  heed3624_0  
libefp                    1.5b2                h470d631_0    psi4/label/agg
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 7.2.0                h7cc24e2_2  
libgfortran-ng            7.2.0                hdf63c60_3  
libint                    1.2.1                h73b9bb0_3    psi4/label/agg
libstdcxx-ng              7.2.0                h7a57d05_2  
libxc                     3.0.0                h5eb71ee_3    psi4/label/agg
mkl                       2018.0.2                      1  
mkl_fft                   1.0.1            py36h3010b51_0  
mkl_random                1.0.1            py36h629b387_0  
more-itertools            4.1.0                    py36_0  
ncurses                   6.0                  h9df7e31_2  
numpy                     1.14.2           py36hdbf6ddf_1  
openssl                   1.0.2o               h20670df_0  
pip                       9.0.1            py36h6c6f9ce_4  
pluggy                    0.6.0            py36hb689045_0  
psi4                      1.2a1.dev999+7086126  py36h4680546_0    psi4/label/agg
py                        1.5.3                    py36_0  
pycosat                   0.6.3            py36h0a5515d_0  
pycparser                 2.18             py36hf9f622e_1  
pyopenssl                 17.5.0           py36h20ba746_0  
pysocks                   1.6.7            py36hd97a5b1_1  
pytest                    3.5.1                    py36_0  
python                    3.6.4                hc3d631a_1  
readline                  7.0                  ha6073c6_4  
requests                  2.18.4           py36he2e5f8d_1  
ruamel_yaml               0.15.35          py36h14c3975_1  
setuptools                38.4.0                   py36_0  
six                       1.11.0           py36h372c433_1  
sqlite                    3.22.0               h1bed415_0  
tk                        8.6.7                hc745277_3  
urllib3                   1.22             py36hbe7ace6_0  
wheel                     0.30.0           py36hfd4bba0_1  
xz                        5.2.3                h55aa19d_2  
yaml                      0.1.7                had09818_2  
zlib                      1.2.11               ha838bed_2 

Nothing from psi will use this one. Should be droppable unless numpy or something needs it.

If you’re planning on running dft, you’ll need to add dftd3 and gcp (psi4 channel)

I’m concerned about 1.2a1.dev999+7086126, as that’s about 200 commits ago. Looks like this line is registering only psi4/label/agg. It needs to be the equiv of [psi4/label/agg, psi4, defaults] in order of greatest to least priority. For packages that don’t change (like dftd3), we keep them on the main channel (hmm, just a label, I guess wouldn’t hurt to duplicate in future). I think the conda solver couldn’t find everything it needed in just [psi4/label/agg, defaults] (which is what you’re line’s accessing) so it fell back on an old psi4 package.

I tried:

python /psi4_test.py 
Intel MKL FATAL ERROR: Cannot load libmkl_intel_lp64.so.

no we don’t need dft.
I fixed the installation, now the image is 1GB and here is conda list:

asn1crypto                0.24.0                   py36_0   
attrs                     17.4.0                   py36_0   
ca-certificates           2018.03.07                    0  
certifi                   2018.4.16                py36_0  
cffi                      1.11.4           py36h9745a5d_0  
chardet                   3.0.4            py36h0f667ec_1  
chemps2                   1.8.7                ha40901b_0    psi4/label/agg
conda                     4.4.10                   py36_0   
conda-env                 2.6.0                h36134e3_1   
cryptography              2.1.4            py36hd09be54_0  
decorator                 4.3.0                    py36_0   
deepdiff                  3.3.0                      py_0    psi4
dkh                       1.2                  h173d85e_2    psi4/label/agg
gau2grid                  1.0.1                h14c3975_0    psi4/label/agg
gdma                      2.2.6                hf4d0741_5    psi4/label/agg
hdf5                      1.10.1               h9caa474_1  
idna                      2.6              py36h82fb2a8_1   
intel-openmp              2018.0.0                      8  
jsonpickle                0.9.6                      py_0    psi4
libedit                   3.1                  heed3624_0   
libefp                    1.5b2                h470d631_0    psi4/label/agg
libffi                    3.2.1                hd88cf55_4   
libgcc-ng                 7.2.0                h7cc24e2_2   
libgfortran-ng            7.2.0                hdf63c60_3   
libint                    1.2.1                h73b9bb0_3    psi4/label/agg
libstdcxx-ng              7.2.0                h7a57d05_2   
libxc                     4.0.2                h14c3975_0    psi4/label/agg 
mkl                       2018.0.2                      1   
mkl_fft                   1.0.1            py36h3010b51_0   
mkl_random                1.0.1            py36h629b387_0  
more-itertools            4.1.0                    py36_0   
ncurses                   6.0                  h9df7e31_2   
networkx                  2.1                      py36_0   
numpy                     1.14.2           py36hdbf6ddf_1   
openssl                   1.0.2o               h20670df_0   
pcmsolver                 1.2.0rc2         py36h8733eb9_0    psi4/label/agg
pip                       9.0.1            py36h6c6f9ce_4   
pluggy                    0.6.0            py36hb689045_0   
psi4                      1.2a1.dev1255+a9aa2db  py36hd70b514_0    psi4/label/agg
py                        1.5.3                    py36_0   
pycosat                   0.6.3            py36h0a5515d_0   
pycparser                 2.18             py36hf9f622e_1   
pyopenssl                 17.5.0           py36h20ba746_0   
pysocks                   1.6.7            py36hd97a5b1_1   
pytest                    3.5.1                    py36_0   
python                    3.6.4                hc3d631a_1   
readline                  7.0                  ha6073c6_4   
requests                  2.18.4           py36he2e5f8d_1   
ruamel_yaml               0.15.35          py36h14c3975_1   
setuptools                38.4.0                   py36_0  
six                       1.11.0           py36h372c433_1  
sqlite                    3.22.0               h1bed415_0  
tk                        8.6.7                hc745277_3  
urllib3                   1.22             py36hbe7ace6_0  
wheel                     0.30.0           py36hfd4bba0_1  
xz                        5.2.3                h55aa19d_2  
yaml                      0.1.7                had09818_2  
zlib                      1.2.11               ha838bed_2  

it’s ilp64 I think you can drop. psi needs lp64.

Among the psi-supplied pkgs, looks like only libint and psi4 itself break 5 MB, and those are vital, so any savings will have to come from other pkgs (some of which may be deps of psi optional deps).

All the lib*-ng won’t save much

-rw-rw-r--. 1 psilocaluser psilocaluser 6.1M Apr  6 18:01 /home/psilocaluser/toolchainconda/pkgs/libgcc-ng-7.2.0-hdf63c60_3.tar.bz2
-rw-rw-r--. 1 psilocaluser psilocaluser 1.3M Apr  6 18:34 /home/psilocaluser/toolchainconda/pkgs/libgfortran-ng-7.2.0-hdf63c60_3.tar.bz2
-rw-rw-r--. 1 psilocaluser psilocaluser 2.6M Apr  6 18:01 /home/psilocaluser/toolchainconda/pkgs/libstdcxx-ng-7.2.0-hdf63c60_3.tar.bz2

even expanded they’re not too big, so no use just cutting out the fortran addons. (everything in lib >5mb below)

-rwxrwxr-x. 1 psilocaluser psilocaluser  5.6M May  1 16:47 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libchemps2.so.3
-rwxrwxr-x. 4 psilocaluser psilocaluser  5.6M Mar  8 13:41 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libgfortran.so.4.0.0
-rwxrwxr-x. 5 psilocaluser psilocaluser  5.8M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_rt.so
-rwxrwxr-x. 2 psilocaluser psilocaluser  6.1M Apr 25 15:38 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libpcm.so.1
-rwxrwxr-x. 5 psilocaluser psilocaluser  6.2M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_cmpt.so
-rwxrwxr-x. 5 psilocaluser psilocaluser  6.2M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_def.so
-rwxrwxr-x. 6 psilocaluser psilocaluser  7.0M Mar  8 13:41 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libasan.so.4.0.0
-rwxrwxr-x. 5 psilocaluser psilocaluser  7.2M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_scalapack_ilp64.so
-rwxrwxr-x. 5 psilocaluser psilocaluser  7.2M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_scalapack_lp64.so
-rw-rw-r--. 1 psilocaluser psilocaluser  7.7M May  2 00:47 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libhdf5.a
-rwxrwxr-x. 6 psilocaluser psilocaluser  7.8M Mar  8 13:42 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libstdc++.so.6.0.24
-rwxrwxr-x. 5 psilocaluser psilocaluser  9.3M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_gf_ilp64.so
-rwxrwxr-x. 5 psilocaluser psilocaluser  9.4M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_intel_ilp64.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   11M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_gf_lp64.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   11M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_intel_lp64.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   11M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_mc2.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   11M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_mc.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   11M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_mc3.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   12M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_avx512.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   13M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_avx2.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   13M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_avx.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   14M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_vml_avx512_mic.so
-rwxrwxr-x. 3 psilocaluser psilocaluser   19M Jul  6  2017 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libint.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   19M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_sequential.so
-rwxrwxr-x. 1 psilocaluser psilocaluser   22M Apr 19 15:19 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libpython3.6m.a
-rwxrwxr-x. 5 psilocaluser psilocaluser   23M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_gnu_thread.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   23M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_tbb_thread.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   36M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_pgi_thread.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   36M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_def.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   37M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_intel_thread.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   42M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_mc.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   43M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_mc3.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   46M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_avx.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   51M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_ao_worker.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   57M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_avx2.so
-rwxrwxr-x. 3 psilocaluser psilocaluser   62M Jul  6  2017 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libderiv.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   65M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_core.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   68M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_avx512.so
-rwxrwxr-x. 5 psilocaluser psilocaluser   77M Mar 22 12:20 /home/psilocaluser/toolchainconda/envs/p4dev36/lib/libmkl_avx512_mic.so

Ah sorry, I dropped ilp64 already after I posted the list

looks like the only substantial savings left will be mkl --> openblas. do you need libefp or chemps2?

We won’t need libefp or chemps2 (the latter is for dmrg, right?). We need dft, mp2 and coupled cluster energies.

For dft it would be nice to have access to wb97m-v (requires vv10). My understanding is that this should be available if our build is based on the latest development branch.

1 Like

Correct on chemps2=dmrg and that wb97m-v is in master. Though I’d think you’d want dftd3 also to have -d3 available.

I looked into doing an openblas conda pkg, and it looks straightforward (except for actually selecting it once built). Haven’t attempted yet, as I’ve been busy with the v1.2rc1 release, but I’ll do so soon.