分子結構語言與熔沸點性質的人工智慧預測 Artificial Intelligence Prediction Of Molecular Structure Language and Melting/Boiling Point Properties
Background:
Predicting molecular properties such as solubility, toxicity, melting, and boiling points is crucial for fundamental science research. However, experimental measurements are often time-consuming and cost-intensive, so we use machine learning (ML) as an approach to improve prediction accuracy.
Methods:
A dataset containing over 10k compounds was used for training shallow and deep ML models. Shallow machine learning models were implemented via PyCaret and Mordred as feature extraction. For deep machine learning models, graph neural networks (GNNs), specifically CMPNN(Communicative Message Passing Neural Network) and GCN(Graph Convolutional Network), were trained, and tuned by adjusting the number of hidden layers and sizes (neurons) in each layer.
Results:
The CMPNN model outperforms the GCN and shallow ML model for boiling point prediction(best: R² = 0.76, MAE = 23.89K for b.p.; best: R² = 0.87; MAE = 23.73K for m.p.). The top molecular descriptor of the b.p. prediction is piPC1, which is related to bond order, and that of m.p. is AATS0d, which is related to σ electron Moreau-Broto autocorrelation.
Conclusions:
The prediction of molecular properties was improved by a comprehensive research of shallow and deep learning approaches, showcasing CMPNN model can reach the highest performance in the prediction of m.p. and b.p.(R² = 0.87 in m.p.; R² = 0.76 in b.p.). In this study, we found that the deep learning model works better than shallow ML in predicting m.p.(p<0.05). This study uses SHAP analysis to successfully identify piPC1 and AATS0d as the key prediction factors of b.p. and m.p. respectively. Moreover, this approach can be applied to predict other molecular properties. To conclude, this study not only shows a highly accurate model but also identifies the key factors of m.p. and b.p.