Can a computer do my experiments for me?

Rose McHardy

Faculty of Science

Department of Pure and Applied Chemistry

A scientist in a lab working on an experiment with test tubes and a microscope. The scientist's head has been replaced with a computer.
Modelling of experiments is on the rise

A famous quote from Professor Dominic Tildesley, former President of the Royal Society of Chemistry in 2013 states:

“The speed and development of computers is now so rapid, and the advances in modelling and informatics are so dramatic that in 15 years time no chemist will be doing any experiments at the bench without trying to model them first.

Once upon a time, computational chemistry was such a niche subject that very few would know of it. Nowadays, computational methods within the sciences are vital and my research demonstrates just one of the many ways it can positively impact the pharmaceutical industry.

For a drug molecule to make it to market, it must go through a vast journey through the drug discovery and development process, which can be split into four stages:

  • Early drug discovery: this involves screening some of the billions upon billions of possible drug molecules to see which ones will be successful at binding to your target protein, which ones will successfully be absorbed into the blood stream, and so on. From this stage, a lead candidate will be chosen for a particular disease and carried forward to the next stage.

  • Preclinical studies: this involves in vivo and in vitro studies. The aim of this stage is to see how your molecule interacts with the body and whether there are any toxicity issues. This is the stage where most drug molecules fail, with only 1 in 5000 lead candidates making it to the next stage.

  • Clinical studies: this involves testing the molecule in the human body, first in healthy patients, then in target diseased patients. This can be the most expensive part of drug discovery, hence why many drugs don’t make it to this stage. If you’re lucky, and your drug molecule passes these stages with flying colours, then it will make it to the final stage.

  • Review and regulation: this involves the Medicines and Healthcare products Regulatory Agency (MRHA) in the UK and the Food and Drug Administration (FDA) in the US who review all the previous studies of the molecule and decide whether to approve it for market. Even if the molecule makes it to market, it is still continuously reviewed and could be pulled from the market at any point (e.g. if toxicity or stability issues are discovered).

This entire process, from discovery to market, can take on average 12 years and cost £2 billion for a successfully marketed drug. Because of this high price tag, pharmaceutical companies are looking for more ways to reduce the number of molecules that never make it to market. This is where computational methods come in.

When selecting a molecule for lead candidate, the chemical space of potential molecules needs to be screened. This chemical space can be reduced by looking at something known as the Lipinski rule of five, which is essentially a list of chemical rules that a drug has to comply with to be considered a successful oral drug. Applying these rules reduces the size of chemical space, but still leaves a very large number of molecules to consider (some estimates say 1060 molecules!). Obviously, a human couldn’t screen all those molecules, and granted neither can a computer, but a computer can do a much better job, as well as never having to synthesise the molecule to test for these properties.

One important property in the Lipinski rule of five is log P, which is a measurement of how a drug molecule partitions between two immiscible solvents, water and octanol. It is a useful indicator of whether a drug molecule would be able to be successfully absorbed in the stomach and then on into the blood stream, and hopefully to the target site.

Figure 1: Demonstration of how a molecule would pass through a cell membrane dependent on the log P value. Icons depicting the process.
Figure 1: Demonstration of how a molecule would pass through a cell membrane dependent on the log P value.

If a molecule has a negative log P, then the molecule is too hydrophilic, meaning it loves to interact with water. This means that the molecule will never be able to pass through the cell membrane as it won’t want to interact with the fatty tails within the membrane, hence it won’t make it to the blood stream.

For the other extreme, if the log P value is more than 5, it is too hydrophobic, meaning that it can successfully pass through into the membrane, but it loves the fatty, hydrophobic tails within the membrane so much, it doesn’t want to leave, and so again never makes it to the blood stream.

What we want from a drug molecule is a happy medium, so having the log P value somewhere between 0 and 5 so that the molecule is hydrophobic enough to pass into the membrane, but not so hydrophobic that it never leaves.

To work out the log P of a molecule, a property known as solvation free energy (SFE) is useful, this is the energy required for a molecule to go from a gaseous phase to an aqueous phase.

This is where my research comes in!

There are a few ways currently to work out the SFE of a molecule, but each have their own disadvantages. Experimental methods mean that the molecule has to be synthesised, so this isn’t suitable for screening a large number of molecules, as well as requiring expensive equipment and specialised labour.

Solvent models can be used to run simulations where the molecule is surrounded by solvent. Explicit solvent models involve representing all solvent molecules atomistically, which is particularly computationally expensive. Implicit solvent models involve representing the solvent as one bulk medium which isn’t as costly but comes at the sacrifice of SFE accuracy.

Machine learning (ML) takes the molecule structure to predict SFE. These algorithms, however, involve large datasets and struggle to predict the SFE of molecules outside these datasets.

My project combines an approximate solvation theory known as the Reference Interaction Site Model (RISM), which is a happy medium between explicit and implicit solvent models, with ML to predict SFEs of druglike molecules.

Figure 2: Overview of my project pipeline to predict SFEs.
Figure 2: Overview of my project pipeline to predict SFEs.

With the work carried out so far, I have been able to predict SFEs of druglike molecules within an error similar to experimental methods, without ever having to synthesise the molecules.

My research has shown so far that I am able to match the performance of a scientist at a lab bench, yet I am able to sit behind a computer in the comfort of my own home. Can a computer do my experiments for me? In this case, I think it can.


  1. Ratkova E. L., Palmer D. S., Fedorov M. V., Solvation Thermodynamics of Organic Molecules by the Molecular Integral Equation Theory: Approaching Chemical Accuracy, Chemical Reviews, 2015, 115, 6312-6356

Author retains copyright to text and images.

99 views0 comments

Recent Posts

See All