Interesting Open-Source vs. Commercial Article

ryanmrichard · July 17, 2015, 2:04pm

Several big-wigs, working for commercial electronic structure packages, put together an article defending why they shouldn’t have to go open source. It’s an interesting read for those interested

http://pubs.acs.org/doi/full/10.1021/acs.jpclett.5b01258

EDIT: grammar

mernst · July 24, 2015, 1:47am

I want to be charitable so I will start by acknowledging the arguments with the strongest merits:

Documentation, packaging/distribution, and support are generally stronger with commercial software. Packaging, in particular; chemists shouldn’t have to know about svn/git, compilers, the differences between cmake and gnumake, and the pitfalls of suboptimal BLAS routines to start predicting IR spectra. It’s really convenient to do sudo apt-get install whatever when the program is packaged, but the performance can be much poorer than building from scratch due to BLAS and lack of architecture-specific optimizations.

1a) I almost missed this one because I don’t use Windows: most chemists and chemistry students probably aren’t running Linux, so apt-get doesn’t help. On OS X there’s homebrew for the somewhat adventurous. If you use the world’s most popular desktop OS, I hope you like GAMESS, because the newer free quantum chemistry programs don’t have Windows binaries or even source code that can easily compile under Windows.

Commercial software may offer more features and/or higher-performance implementations of methods. You aren’t really saving money with free software if you end up spending a lot more on hardware to complete jobs in a reasonable amount of time, or if you spend a lot of extra time trying to work around missing features. Jaguar is supposed to be much stronger than most programs at handling transition metal systems; I don’t know if that is true, but if it is, and you need to run such calculations very often, it’s probably worth the money. Free software is expensive if it takes an extra week for a researcher to figure out how to get reliable convergence.

The rest of the objections to non-commercial software are not very good. There are a bunch of recycled old-and-dead arguments against free software in general. “Open-source requirements potentially force a scientist to choose between pursuing a funding opportunity versus implementing an idea in the quickest, most efficient, and highest-impact way” is particularly suspicious. Think of all the researchers, all three of them, who have source code licenses to multiple commercial quantum chemistry programs but are forced instead to build on a foundation of inefficient and crufty open source.

The argument that we don’t need source code because publication discloses algorithms misses that people can’t know if the publication has disclosed all important information without source code or a second implementation from spec. There are a lot of defaults in e.g. Gaussian that can affect reproducibility that you won’t learn just from high level information disclosed in papers. Did you know how Gaussian’s compiled-in basis set library has modified Truhlar’s “calendar” basis sets from their original publication? I didn’t, until the differences with a calculation in a second program bothered me. The modifications are mentioned in the manual but not documented sufficiently to reproduce. If you have identified a discrepancy, AND you are reasonably fluent in a scripting language, in only a couple of days it’s possible to get Gaussian’s modified basis sets extracted in a form you can use in a second program.

Other differences that are hard to spot in papers, easy to find in source code: differing conventions for what “B3LYP” means, the orbitals that are included in the core of a frozen core calculation, DFT grid generation, numerical constants used for e.g. unit conversion, the detection/inclusion or neglect of symmetry in calculated entropy, basis sets containing more elements than described in their original publication (check out the transition metals in the Gaussian version of “6-311G”!)…

ryanmrichard · July 24, 2015, 2:16pm

@mernst Judging by the author list of the paper the Gaussian comments, although accurate, are not really fair. The major authors are associated with Q-Chem, which was founded in part in objection to the Gaussian practices you mention. Speaking from experience, Q-Chem is quite open with how their algorithms work and will readily work with you if you’re having trouble replicating any of them or simply have questions about them.

Also, open-source or commercial, source-code should never be a substitution for thorough documentation. I don’t know how many quantum chemistry packages’ source codes you’ve seen, but I have yet to find one that I would rather read over the manual…

mernst · July 24, 2015, 11:06pm

Fair enough: I have not used Q-Chem, and it appears to have excellent features and documentation. I see that Q-Chem offers its licensed users source code too, after signing an agreement, and it incorporates many non-employee contributions.

I have made extensive use of 4 non-free chemistry programs: Gaussian, HyperChem, Amber, and MOPAC. I had a love-hate relationship with Gaussian, HyperChem was more hate than love, MOPAC has a very good manual and a responsive developer, and Amber came with source code so though we had to pay for it there was no mystery about the inner workings. Of academic/open source software I’ve used GAMESS, NWChem, Psi, public-domain MOPAC, and Gromacs to a non-trivial extent. GAMESS and the PD MOPAC seem to have the most serious problems: broken features, poor maintenance, poor testing, code and docs out of sync.

I too prefer good documentation to reading source code. Still, source code access has been very helpful at times in clarifying what a paper or manual meant, or in filling in details that were ambiguous from the human-readable documents. I admit that source access is not likely to help the average user much.

About financial calculations and the value of time: wasting your grad student or postdoc’s time doesn’t require a signed purchase order. Spending a non-trivial amount of money on software does, or did when I was a grad student. “Can we predict what this product’s NMR spectrum would look like?” is the sort of question that may prompt a software investigation but (at least initially) is not enough to justify a substantial software purchase. If commercial quantum chemistry packages and good UIs like Spartan offered an hourly or CPU-hourly cloud rental option, I could see more computational investigations starting out with the tools that look best suited to the job instead of whatever presents a low financial/procedural barrier to entry. If the initial results are encouraging, maybe there’s enough evidence to justify a permanent license purchase. I didn’t personally pick HyperChem and Gaussian as tools back in the day; they were just what happened to already have a license in the group.

ryanmrichard · July 27, 2015, 3:54pm

Seeing as how this is the Psi4 forum, I feel obligated to say that if this comment:

applies to Psi4 (in your opinion), we’d love to hear feedback on what sections of the manual need attention/clarification. Manual entries are typically written by that code’s developer, who in turn has intimate knowledge of the underlying code, which is a very different mindset than the typical user. Ideally no user should ever have to read the source code of Psi4 to figure out what’s going on under the hood, if they do, then we’re doing something wrong.

mernst · July 27, 2015, 10:30pm

One example is the DFT documentation; see “B3LYP” here: http://www.psicode.org/psi4manual/master/dft_byfunctional.html

There are different conventions for what VWN functional B3LYP uses by default in different packages: http://scicomp.stackexchange.com/questions/21/how-is-b3lyp-implemented-in-gaussin-0-gamess-us-molpro-etc

I can’t tell from the Psi4 docs which VWN functional is used. If I am trying to compare results with a colleague who is using B3LYP in GAMESS, will the convention used in Psi4 match? What about results from Gaussian?

Fortunately, a grep through the source directs me to lib/python/functional.py where I can read the code for build_b3lyp_superfunctional and see that it uses VWN3, following the Gaussian convention. Maybe following the Gaussian convention, that is; it’s using VWN3RPA_C rather than VWN3_C. I don’t have ready access to the original Vosko, Wilk, and Nusair publication to explain the differences. I would probably just try the default first and see if swapping in VWN3_C improved agreement, were I trying to match Gaussian and the initial attempts went poorly.

Reading the source I also see that there is a more GAMESS-like “b3lyp5” option, but it’s commented out because it is broken. Using git blame I can see that the disabling-for-brokenness dates back to 2012, and if I had some GAMESS results I needed to match I would be tempted to try uncommenting the code and seeing if the unnamed issue that broke it before has been fixed as a side effect of 3 additional years of development.