The University of Michigan-led research team aims to leverage the understanding of emergent connections in meaning, as demonstrated by ChatGPT in predicting the next word, to explore similar patterns for atoms arranged to form molecules.
The research is supported by a one-year grant from the Department of Energy, providing 200,000 node hours on Polaris, a 34-petaFLOP supercomputer at Argonne National Laboratory. The team built a foundational model for molecules, similar to the GPT models that support applications like ChatGPT. The new model concentrates on minor organic molecules relevant to energy storage and conversion applications. The model is composed of carbon, hydrogen, oxygen, and nitrogen.
What we’ve learned from language models is that size matters. The interesting behavior happens when you make it very big. If you train on a small amount of data and ask it to write Shakespeare, it’s not good. But a larger data set is better, and when it’s big enough, text that sounds like Shakespeare emerges.
Venkat Viswanathan, Associate Professor, Aerospace Engineering, University of Michigan
The research team intends to apply their model to enhance the prediction of battery electrolytes. Electrolytes play a crucial role as the medium through which ions move between electrodes during the charge and discharge cycles of batteries.
While many electrode pairs have the potential for higher energy densities compared to current lithium-based batteries, finding electrolytes that are compatible with both electrodes can be challenging. The researchers believe that artificial intelligence can play a key role in addressing this challenge and facilitating the development of more efficient battery systems.
There are several billions of molecules that are possible to make, and we have the text-based representation for them. Our slice of that will be synthesizable small molecules similar to those used in pharmaceuticals and electrolytes.
Venkat Viswanathan, Associate Professor, Aerospace Engineering, University of Michigan
After the model demonstrates the ability to predict missing atoms in small organic compounds, the research team plans to progress to fine-tuning. This involves providing the model with information about the properties of certain compounds and then tasking it with predicting the properties of other compounds.
Through a process of iterative feedback, the team aims to develop an AI system that can effectively understand and predict the chemistry of small organic molecules.
Deep Forest Sciences has built considerable expertise in applying molecular foundation models to drug discovery, and we are excited to apply that knowledge to batteries.
Bharath Ramsundar, Founder and CEO, Deep Forest Sciences
When the model starts to work, the team makes it to predict electrolytes suitable for a particular pair of electrodes. Then, it will experiment with each prescription in the lab with a robotic setup, Clio, developed by Viswanathan and Jay Whitacre, Professor of materials science, engineering, and public policy at Carnegie Mellon University.
Scientists are also seeking new design rules that may emerge from the model—rules that humans are not able to consider. “When we learn chemistry, we learn each rule, and then we learn about a dozen exceptions,” says Viswanathan. “Can this now help us learn better rules or be able to design with more sophisticated combined rules?”
The DOE’s INCITE program, which provides the funding for this study, seeks out computationally intensive, large-scale research projects with the potential to significantly advance key areas in science and engineering.