Kenneth Geisshirt

Blogging about science and technology - mostly

View on GitHub

Emacsforum 2011

Emacsforum 2011
Peter Toft and I are in the process of preparing Emacsforum 2011 with some help by Troels Henriksen (at DIKU) and Keld Simonsen (from KLID). The program is almost ready for publication, so I will not say too much - but there will be something for scientists and developers. Even our Evil Twin will be represented.

The mini-conference takes place 12th November 2011 at DIKU. The is no conference fee - and there will be no benifits.

If you are using Emacs (and even XEmacs) and live in the Copenhagen area, Emacsforum is a good place to meet fellow users.

Free chemistry software - molecular modeling

Molecular modeling is a very large and important field of chemistry. As computers have increased in raw computing power, the usage of computers to calculate molecular properties is not a specialized fields for the few. Today, every chemist can perform calculation for even large molecules.

Roughly speaking, you can divide the calculations in two separate groups: molecular mechanics and quantum chemistry. The first is based on classical mechanics, while the second group uses quantum mechanics as a underlaying model and equations. The software list at shows a wide range of offerings. Many of them are commercial and closed-source solutions.

Molecular mechanics calculations are used when no chemistry is going to. By definition, chemistry is the rearrangement of atoms - and that involves electrons. But molecular mechanics can be used to investigate how a molecule is solvent that is, how its structure changing when it is surrounded by a solvent like water.

Gromacs is one of the oldest and most successful molecular mechanics software suites. It is covered by the GNU General Public License (version 2), and most distributions like Debian GNU/Linux have a package of it. It does not come with a fancy user interface, and the user primarily interacts with Gromacs at the command line. It is a big advantage as some of the operations can take a long time. Seldomly, you sit by your computer and work with Gromacs. The typical usage is to write small shell scripts and run them as batch jobs.

Today, most supercomputers in the chemical industry and academia are Linux clusters which are build from commodity hardware. That means that a supercomputer is a distributed system where the individual processors are loosely coupled using Ethernet (or maybe InfiniBand). The MPI  framework is used by Gromacs to utilize such a supercomputer (if you have a SMP system, MPI can still be used for parallellization). Queuing systems (SUN Grid Engine, OpenPBS/Torque, etc.) schedule which batch job to execute, and the command-line nature of Gromacs comes to its rights on such systems.

Explaining all details of Gromacs is not the scope here. But let us a quick tour on how to use some of the many utilities and programs of Gromacs. The assignment is to take the experimentally determined structure a small biological active molecule and create a solvated version of the molecule. The structure found at the Protein DataBank is for a crystal, and IGF-1 (as most other molecules in your body) is in a solution where water is the solvent (remember, 60 % of you body is water). For the tour, the Insulin-like growth factor 1 (IGF-1) is chosen. IGF-1 is a small protein (or peptide) which is involved of the growth and regeneration of your body. You can download a file with the experimental structure from Protein DataBank.

First, you must pre-process the downloaded file into files used by Gromacs. In that process you decided the force field. The force field is the parametrization of the interaction between the atoms, and all calculations in Gromacs (and any other molecular mechanical program) are based on Newton's second law. In the command-line below, two files are generated (2GF1.gro and

pdb2gmx -f 2GF1.pdb -o 2GF1.gro -p -ignh -ff G53a6

Now you have to edit the output file (2GF1.gro) in order to change box size. You can do an energy minimization and generate a solvation box using the commands (some steps might take some time):

mdrun -v -deffnm 2GF1-EM-vacuum -c 2GF1-EM-vacuum.gro
editconf -f 2GF1-EM-vacuum.gro -o 2GF1-PBC.gro -bt dodecahedron -d 1.2
genbox -cp 2GF1-PBC.gro -cs spc216.gro -p -o 2GF1-water.gro

The final file is 2GF1-water.gro which is the biological molecule solvated in water. It might not sound as a great deal, but the file can be used in further simulation involving the solvated molecule.

Other molecular modeling packages exists. NAMD is a highly scalable molecular dynamics program. It is aimed at large molecules (proteins) and can utilize very large parallel computers. But NAMD is not free software as defined by Free Software Foundation. You can download it and use it for any non-commercial purpose.

Free chemistry software - utilities

Free chemistry software - utilities

One of the major annoyances as chemists in front of computer is faced with is the vast number of file formats. The good news is that most file formats are text files so it is possible to reverse engineer them by looking at a number of examples. One open source project called OpenBabel tries to help chemists in converting between the formats (currently OpenBabel supports 113 file formats related to chemistry). Most Linux distributions have packages for OpenBabel, including Debian GNU/Linux (it's a version from 2009 you find in Debian stable). Converting a molecular structure of caffeine from one file format (SDF) to another (PDB) is simply done by the following command:babel -isdf caffeine.sdf -opdb caffeine.pdb

You can find many small molecules - with 3D structures, physical properties and toxicology data - at PubChem. For larger molecules (proteins mainly), you can go to the Protein Data Bank. The file for caffeine as used above can be found at PubChem.

OpenBabel project also includes a number of other utilities including a chemist's version of grep called obgrep (searching for molecules with a particular substructure within a database) and simple program to (energy) minimize a molecule called obminimize.

GNOME Chemistry Utils is a set of utilities developed for GNOME users. The set includes a calculator (for calculating the molecular mass of a molecule), the periodic table of the elements, and a spectrum viewer. The periodic table of the elements can give you the physical and chemical properties of all elements. Most chemists have a periodic table of elements close when working,
and having one on your desktop seems as a good idea.

Chemists do a lot of drawing: they draw structures of molecules. In can be regarded as a generic representation of a molecules 3-dimensional structure using a 2D paper. Understanding and drawing such chemical structures are an integral part of any chemist's education and chemists have used these drawing for more than 150 years (the discovery of the electron and the development of quantum mechanics changed the view of molecular structures). The 3-dimensional geometry is an important factor for determine the properties (reactivity, toxicology, color, etc.) of a molecule.

A drawing program for chemists is not hard to image. When it comes to free software, we are so
lucky that we have more than one. GNOME users can use the molecular drawing program from the GNOME Chem
istry Utils project. It is called GChemPaint. As GChemPaint can only load a rather small number of file formats, you really learn to use OpenBabel rather quickly. It is an easy program to work with, and it is possible to save your drawing in most used image formats (both bitmap and vector formats). You can then easily insert your drawing
in your favorite word processing software prior to publication (take publication rather broad: everything from a high-school report to a paper in Nature).

As already said, drawing programs for chemists are not hard to imagine. Other projects in this area include titles likes bkchem, chemtool, easychem, xdrawchem, jchempaint, molsKetch (probably stalled).

Free chemistry software - Introduction

The year of 2011 has been declared the International Year of Chemistry by UNESCO (United Nations Educational, Scientific and Cultural Organization) and IUPAC (International Union of Pure and Applied Chemistry). The purpose of devoting a full year to chemistry is to spread the notion that chemistry is important for our daily life.

In this series of blog post I have examine the state of free software in chemistry. This first post is an outline of the usage of computers in chemistry.

It is hard to imagine modern life without the discoveries and developments done by chemists and chemical engineers over the last two or three centuries. Plastic, gasoline, and pharmaceuticals are products from the chemical industry. And forensic scientists use many chemical analysis in order to provide evidence for police investigations all over the world. But chemistry is more that an applied science. It also give us an insight to how our world works. In the recent decade, the modern cuisine has changed. For example, the cheif Heston Blumenthal has been using chemistry to create new dishes (this branch of chemistry is called molecular gastronomy).

As you can see, chemistry is a broad science and engineering discipline. Modern chemistry is divided into a number of branches. Traditionally, an academic education of chemistry consists of courses in general, organic, inorganic, physical and even analytical chemistry. Chemistry is a wet science, and as a student you spend a lot of time in laboratories. Amongst chemists, it is still discussed whether chemistry is a descriptive science (classification of observations) or an exact science (explaining observations).

Chemistry interfaces most other sciences, including physics, mathematics, statistics and biology. Quantum chemistry applies quantum mechanics to calculate properties of chemical substances. But as you might imagine, the three-body problem is a serious show-stopper for a chemist as very few molecules have only three nucleus and electrons.

Computers are heavily used in chemistry. One example is to perform quantum chemistry calculation as finding an analytical solution for a many-body problem is impossible. A rough break-down of the usage of computers in chemistry consists of three major areas. Firstly, you have the end-user applications used by every chemist. The applications are domain-specific applications - the domain is as broad as chemistry. The second area is chemoinformatics. It is a fairly young area (a decade or two only). Chemoinformatics applies techniques from informatics to transform chemical data to knowledge and thereby improving the decision making process. The usage of specialized databases and search algorithms is an integral part of chemoinformatics. Any non-trivial chemical compound can be represented in a number of ways. Even a small molecule like styrene can be named in different ways depending on how you look at it. Chemoinformaticians have introduced a string representation for all chemical compounds called the simplified molecular input line entry specification (SMILES). The SMILES code for styrene is C=CC1=CC=CC=C1. Image to find all compounds in your database with a certain substructure. You cannot use a regular expression or an SQL query. As molecules can be regarded as graphs (atoms are connected by chemical bonds), searching in chemical databases is a variant of find subgraphs. This is the core of chemoinformatics.
The third area where computers are used in chemistry is to perform calculations and it is often refered to as computational chemistry. It is an old area - calculation and simulation of properties of chemical compounds and reaction have been carried out as long as computers have been available to scientists. The calculations either use a classical-mechanical approach or a quantum mechanical approach. In the first approach, electrons is neglected and a force-field between the atoms are applied. This is possible to simulate large molecules using this approach. But if you need to predict the energy levels, thermodynamical properties, and charge distribution of a molecule, you have to use a quantum mechanical approach. This involves solving the time-independent Schrödinger equation (or at least an approximation to the equation called the Born-Oppenheimer approximation).

It is important to understand that most chemists are not educated as programmers. On the other hand, using computational techniques can save chemical industries huge fortunes. Today, most pharmaceutical companies have specialized departments for performing chemical calculations and supporting an informatics infrastructure. These departments are small in terms of man-power compared to the company as a whole. As the market is small and the potential benefits huge (time-to-market and saving expensive laboratory time), vendors often ask for very high license-fees. Vendors like Schrödinger, Wavefunction and Gaussian, and OpenEye offer software packages for chemists. Sadly, free software is a minor player in chemistry but you can find free chemistry software for most needs.