Jul 18 2019
Chemistry is not as simple as merely mixing compound A and compound B to create compound C. There are catalysts that influence the reaction rate, as well as the physical circumstances of the reaction and any in-between steps that lead to the end product.
If a chemist is attempting to make a new chemical process for, say, materials research or pharmaceutical, the person would need to identify the best of each of these factors. It is a laborious trial-and-error process. Until now.
In a new publication in Nature, University of Utah chemists Jolene Reid and Matthew Sigman demonstrate how examining formerly published chemical reaction data can predict how hypothetical reactions may continue, tapering the range of conditions chemists need to investigate. Their algorithmic prediction process, which comprises aspects of machine learning, can save valuable resources and time in chemical research.
We try to find the best combination of parameters. Once we have that we can adjust features of any reaction and actually predict how that adjustment will affect it.
Jolene Reid, Chemist, University of Utah
Trial and error
Earlier, chemists who wanted to perform a reaction that had not been tried before, such as a reaction to attach a specific small molecule to a specific spot on a larger molecule, approached the challenge by examining a similar reaction and imitating the same conditions.
"Almost every time, at least in my experience, it doesn't work well," Sigman says. "So then you systematically change the conditions."
But with numerous factors in each reaction—Sigman guesses about seven to 10 in a standard pharmaceutical reaction—the number of likely combinations of conditions becomes complicated. "You cannot cover all of this variable space with any type of high throughput operation," Sigman says. "We're talking billions of possibilities."
Narrowing the field
So, Sigman and Reid searched for a way to taper the focus to a more controllable range of conditions. For their test reaction, they studied reactions that involve molecules with opposite mirror images of each other (in the same way a person’s left and right hands are mirror images of each other) and that select more for one configuration than another. Such a reaction is referred to as "enantioselective," and Sigman's lab explores the varieties of catalysts involved in enantioselective reactions.
Reid gathered together published scientific reports of 367 forms of reactions involving imines, which consists of a nitrogen base, and used machine learning algorithms to correlate features of the reactions with how selective they were for the two different types of imines. The algorithms studied the reactions' catalysts, reactants, and solvents, and formulated mathematical relationships between those properties and the final selectively of the reaction.
There's a pattern hidden beneath the surface of why it works and doesn't work with this condition, this catalyst, this substrate, and so on.
Matthew Sigman, Chemist, University of Utah
"The key to our success is that we use information from many reactions," Reid adds.
Easing the pain
Is their predictive model effective? It positively predicted the results of 15 reactions comprising one reactant that was not in the original set, and the results of 13 reactions where both a catalyst and reactant type were not in the original set.
Finally, Reid and Sigman examined a recent study that carried out 2,150 experiments to detect the ideal conditions of 34 reactions. Without soiling even one beaker, Reid and Sigman's model came up with the same results and same ideal catalyst.
Reid anticipates applying the model to predicting reactions that include large, complex molecules. "Often you find that new methodologies aren't fine-tuned to complex systems," she says. "Possibly we could do that now by predicting beforehand the best kind of catalyst."
Sigman states that predictive models can reduce the barriers to new drug development.
"The pharmaceutical industry doesn't want to invest money into something that they don't know if it's going to work," he says. "So, if you have an algorithm that suggests this has a high probability of working, you ease the pain."