In this post, we’ll take a SMILES string, convert that into a molecule file with optimised geometry, and finally perform a simple dihedral scan at the quantum chemical level.

We’ll start with a few of my basic notes on quantum chemistry, but feel free to skip that and jump right into the how-to by clicking here.

attempting to understand the basics of ab initio computational chemistry

In MD simulations, we’re using classical mechanics, and considering an atom as a single entity… which obviously, it isn’t.

When we do ab initio QM, we really start from nothing, and use theory alone to calculate molecular properties. This is done by solving the Schrödinger Equation1:

Where is the hamiltonian, and the uppercase psi is the wave function. It’s this wave function that we’re attempting to calculate, as it is what contains all of the information in the system.

There are a variety of methods to do this; the Hartree-Fock(HF) is widely known and has been around for decades, although there are plenty of others, including post-Hartree-Fock methods like Møller–Plesset perturbation theory (MP).

Once we have our method, we need to choose the appropriate mathematical functions to represent the electron orbitals. This set of functions are called the basis set.

With the two working in tandem, we can calculate the wave function, and extract the information we’re really after, namely potential energy. This combination of method and basis set will vary from system to system, a constant interplay between computational cost, accuracy, and applicability. New methods and basis sets are continually being developed, as what we’re really doing is calculating an approximation.

We call the combination of method and basis set the level of theory, which describes the degree of approximation used to solve the Schrödinger equation for our system, expressed as:

Method/Basis Set

i.e.

HF/6-31G*

MP2/CC-PVDZ

Getting Started

Step 1: Converting a SMILES string to a 3d structure

The dihedral I want to parameterise comes from cholesteryl oleate, an awfully large molecule (by QM standards, maybe). I decided instead to take a small portion which has the dihedral I want, a cyclohexyl acetate:

I can take the SMILES string and generate a 3d structure using openbabel2:

openbabel -:"CC(=O)OC1CCCCC1" -oxyz -O CHA.xyz --gen3d # generating an .xyz file

Generating an .xyz file 3, which looks like:

24

C          1.11274        0.33165       -0.17130
C          2.59623        0.29037       -0.37507
O          3.14111       -0.14618       -1.37891
O          3.22834        0.83932        0.69728
C          4.66678        0.88784        0.63289
C          5.25460       -0.47465        1.01489
C          6.77417       -0.42078        1.13132
C          7.21986        0.65899        2.11295
C          6.64390        2.02281        1.74361
C          5.12562        1.97323        1.61144
H          0.63303       -0.37064       -0.85930
H          0.74337        1.34120       -0.36552
H          0.86392        0.02453        0.84803
H          4.98756        1.17247       -0.37805
H          4.96967       -1.23172        0.27519
H          4.81832       -0.81073        1.96443
H          7.20956       -0.22018        0.14471
H          7.15454       -1.39532        1.45697
H          8.31417        0.71259        2.13226
H          6.89497        0.38761        3.12510
H          7.08235        2.35996        0.79633
H          6.92364        2.75926        2.50528
H          4.74681        2.94859        1.28212
H          4.67278        1.79802        2.59618

It’s an extremely simple format: the first line shows us we have 24 atoms, the second line is blank, but you can write a descriptor “i.e. Cyclohexyl acetate”4, and the rest of the lines are organised to show the atom type and it’s x, y, z cartesian coordinates.

Step 2: Simple Geometry Optimisation

The 3d structure we have is geometrically perfect. Yet molecules are not. We’ll need to perform some basic energy minimisaion to return something more realistic to save us ttime when we start running simulations at the quantum level.

We can do this by using the obminimize function of openbabel:

obminimize CHA.xyz > CHA.2.xyz

Step 3: Quantum Geometry Optimisation

We really should optimise our starting geometry as much as possible, and so we can use our QM engine to optimise the geometry further in accordance with the level of theory we’ve chosen. Since I want to generate atomistic parameters for the CHARMM36 MD forcefield, and previous similar parameterisation efforts have used the MP2/CC-PVDZ level of theory, I think it’s where I’ll start.

We’re going to be using the program psi4 to run all of our QM calculations. I found it pretty easy to use considering it was my first time. Other common alternatives are ORCA and Gaussian, the latter of which is paid.

Psi4 has a jupyter/python API, but it also takes an input file written in psithon; basically python with some extra, psi4-specific stuff.

My input file to optimise the geometry of our pre-optimised molecule (psi4_optimise.dat) is as follows:

import psi4

memory 16GB

molecule = psi4.geometry("""
24
CHA.2.xyz 
C          1.04300       -0.28100        0.19400
C          2.52100       -0.11700        0.37800
O          3.04000        0.79800        1.00200
O          3.18000       -1.15100       -0.20900
C          4.61600       -1.14900       -0.08900
C          5.22900       -0.20700       -1.13100
C          6.74900       -0.32100       -1.17500
C          7.19400       -1.76100       -1.40900
C          6.60100       -2.70400       -0.36700
C          5.08200       -2.59100       -0.31300
H          0.55300        0.69000        0.30900
H          0.65700       -0.97900        0.93900
H          0.82700       -0.64600       -0.81400
H          4.90800       -0.84100        0.92300
H          4.94500        0.83000       -0.91600
H          4.80900       -0.42900       -2.12100
H          7.17100        0.04300       -0.23000
H          7.14500        0.32100       -1.97000
H          8.28800       -1.82000       -1.38100
H          6.88200       -2.08100       -2.41100
H          7.02100       -2.47100        0.61900
H          6.88500       -3.73700       -0.60000
H          4.68700       -3.23100        0.48500
H          4.64800       -2.97900       -1.24400
""")
set {
    basis cc-pvdz
    energy mp2
    scf_type df
}
optimize('scf')

molecule.save_xyz_file("CHA.3.xyz",1)

scf_type df, by the way, refers to the Self-Consistent Field procedure, which is the iterative process of solving the Schrödinger equation repeatedly to update the coefficients describing our orbitals, which we repeat until they converge. Density Fitting (DF)-SCF is a way of speeding up the procedure by introducing more approximations, although apparently the loss of accuracy is quite minimal.

Running on 4 cores with psi4 psi4_optimise.dat -n 4, it finishes in about 3 minutes.

Step 4: Dihedral Scan

When doing a dihedral scan, what we’re doing is fixing this dihedral at a given value, optimising the geometry to calculate the potential energy on the system, rotating the dihedral by a small amount, and repeating. We then return the potential energy vs. dihedral angle at the end, to plot the potential energy vs angle of the dihedral.

I know from looking at the structure that the dihedral I’m after is D(2,4,5,10), where the numbers are the atom indices in our input structure. When I look in the output file, psi4_optimise.out, I can look for the value of this dihedral angle in the final optimisation step, and I see:

Coordinate      Previous         Force          Change          New
...
D(2,4,5,10)     153.95468       -0.00000       -0.00160     153.95308

Giving us an optimised dihedral value of 153.95308 once the program has converged.

This is our starting point. Since I want to do a 360 degree scan at 10 degree intervals, I write a simple function to start indexing at 150 and work it’s way all around, back to 140, where the scan will finish:

def define_angles(psi=0):
    angles = np.arange(0, 360, 10) - 170
    idx = np.where(angles == int(psi))[0][0]
    return(np.concatenate([angles[idx:], angles[:idx]]))

This is just generating a list from -170 to 180, and then deciding to start from our chosen psi value by cutting the original list at that point and rearranging it.

The actual implementation of the dihedral scan will look something like this:

dih_range = define_angles(150)

for psi in dih_range:
    geometric_keywords = {
      'coordsys' : 'tric',
      'constraints' : {
      'set' : [{'type'    : 'dihedral',
                'indices' : [1, 3, 4, 13],
                'value'   : psi }]
      }
    }
    E = optimize('scf', engine='geometric', optimizer_keywords=geometric_keywords)
    out = f"CHA.{psi}.xyz"
    print(psi, E)
    PES_C.append((psi, E))
    molecule.save_xyz_file(out,1)

Which will result in a 2D array containing the angle(psi) and its corresponding energy.

Footnotes

  1. this is the time-independent form of the equation; there’s also a time-dependent form, given as:

  2. conda install openbabel -c conda-forge

  3. If you want to write out the pdb with openbabel so that you can visualise it using something like VMD, run openbabel -:"CC(=O)OC1CCCCC1" -opdb -O CHA.pdb --gen3d

  4. In fact, maybe you should write a descriptor, as I sometimes got errors parsing the file in later steps with psi4 which would simply skip the blank line.