Per- and polyfluoroalkyl substances (PFAS) are widely used in various products as water-repellents and stain-resistant coatings. PFAS are called "Forever chemicals" due to their exceptional thermal and chemical stability, and have been found globally in the environment, humans, and wildlife. Long-chain perfluoroalkyl acids, including perfluorooctanoic acid (PFOA) and perfluorooctanesulfonic acid (PFOS), are persistent, bioaccumulative, and toxic. Globally, PFOA- and PFOS-related substances are regulated by the Stockholm Convention on Persistent Organic Pollutants (POPs). A key toxicological aspect of PFAS, especially PFOA and PFOS, is their disruption of lipid metabolism through interaction with the PPARα, essential in lipid metabolism, energy balance, and cell differentiation. PFAS binding to PPARα disrupts signaling pathways, causing various biological effects. However, the potential hazards (e.g., bioactivity, bioaccumulation, and toxicity) of thousands of PFAS types, including next-generation alternative PFAS is limited. In this study, we developed an explainable machine learning approach to predict the binding affinity of PFAS-PPARα.
We obtained SMILES data for 6,798 PFAS from the U.S. EPA database and used Molecular Operating Environment (MOE) to calculate 206 molecular descriptors and binding affinity (i.e., S-score) to PPARα for each PFAS. Results revealed that 4,089 PFAS exhibited S-scores lower than those of both PFOA (S-score = -5.03 kcal/mol) and PFOS (S-score = -5.09 kcal/mol). Through the systematic and objective selection of important molecular descriptors, we developed a machine learning model with good predictive performance using only three descriptors (R2=0.72). The molecular size (b_single) and electrostatic properties (BCUT_PEOE_3 and PEOE_VSA_PPOS) are important for PPARα-PFAS binding. Alternative PFAS are considered safer than their legacy predecessors. However, we found that alternative PFAS with many carbon atoms and ether groups exhibited a higher binding affinity for PPARα than legacy PFOA and PFOS. Our novel approach outperforms traditional QSAR and machine learning approaches in terms of interpretability, thereby providing deeper insight into the molecular mechanism of PFAS toxicity.
In the present study, the machine learning model successfully predicted the binding affinity of PFAS to human PPARα and predicted key molecular characteristics in the binding. Although this study focused on PFAS-PPARα binding, our approach is also relevant to other ligand-receptor binding and other structure-property relationship studies. This study was limited to ligand-receptor binding. Future research could improve the accuracy of toxicity predictions by incorporating more features. Such studies would involve not only the structural details of PFAS but also information about downstream signal transduction pathways, thereby potentially enabling more precise toxicity predictions. However, limitations exist. We focused on the interaction with PPARα, whereas PFAS could induce toxicity through other receptors. Notably, a high binding score does not always reflect toxicity. Thus, the actual toxicity must be experimentally verified. Despite these limitations, our method allows for rapid, cost-effective PFAS screening, providing a preliminary understanding their potential toxicity and guiding further in-depth experimental investigations.