May 14 2019
Recently, Big Data has turned out to be omnipresent particularly in domains with complex and heterogeneous data patterns. This is specifically true for chemistry.
To a certain extent, chemical compounds may be compared to synonyms in linguistics since one specific compound can be denoted in different ways. To make things more complicated, some of them do not even have a definite structure and are only present as an amalgamation of forms transforming into each other. Hence, it is essential to know whether different compounds or different representations of the same compound are dealt with.
At times, databases also have errors resulting from common ignorance of software features or just normal inattention. Such errors are detected and corrected using exclusive software.
When it comes to organic chemistry, reactions are extremely hard to analyze. Due to this reason, the reaction data in chemoinformatics is much less developed than information about single molecules.
Since 2013, Laboratory of Chemoinformatics and Molecular Modeling (Kazan Federal University) has been trying to solve this issue. Until now, the attempts have been funded by the Government of Russia and the Russian Science Foundation. The team includes researchers from the University of Strasbourg, University of North Carolina, Moscow State University, Palacky University Olomouc, and Helmholtz Center in Munich.
Kazanites have become trained to foresee reaction characteristics, detect and correct data errors, and find optimal reaction conditions. Therefore, a distinctive database of reaction characteristics has emerged. Presently, it comprises of 3.5 million entries. KFU is the only Russian member of Reaxys R&D Collaboration, a collective working on chemical databases.
In this new project, titled CGRtools, KFU scientists found solutions to several issues to deal with reaction information in a better way. The software library is considerably more abundant in functionality when compared to all the currently employed tools. Being the only tool supporting CGRs, CGRtools supports molecules and reaction as objects. CGRtools considers chemical objects in the same way as standard Python data types such as strings, integers, and so on. Every chemical object is hashable because of atom numbering canonicalization. The objects support transparent class inheritance, which adds to existing functionalities—techniques and features—without separating existing ones.