摘要: | 近年來,全球的快速發展極大地增加了對替代能源特別是太陽能的需求。有機太陽能電池(OSCs)和染料敏化太陽能電池(DSCs)等新一代光伏技術因其低成本和多樣化的材料選擇而備受關注。尤其重要的是減少對環境有害的重金屬,例如鈣鈦礦太陽能電池 (PSC) 中的金屬鉛。 在這項研究中,我們利用基於樹的 XGBoost 和人工神經網路 (ANN) 技術開發了四種基於機器學習的預測模型。這些模型採用源自實驗和 DFT 計算的分子描述子 (MD),對三元 OSC 材料進行高通量虛擬篩選 (HTVS)。 HTVS 分析使用了兩個不同的資料庫:第一個包含根據現有資料庫重建的 429,413 個獨特的三元 OSC 系統;第二個數據來自哈佛清潔能源計畫資料庫 (CEPDB),其中包括約 230 萬個獨特的供體材料分子。這四個 ML 模型在密切相關的分子測試集(內插)上展示了顯著的功率轉換效率 (PCE) 預測準確性。然而,XGBoost 模型在預測與訓練集中明顯不同的分子方面表現出有限的能力。相反,ANN 模型在 HTVS 中表現出強大的外推能力,成功預測了新的潛在三元 OSC 系統,PCE 超過 20%。這項研究透過高效的HTVS,加速了OSC分子材料和先進三元OSC技術的發展。 另一方面,提出了一種專門為鋅卟啉敏化太陽能電池設計的精確、預測和可解釋的機器學習模型。該模型使用理論上可計算、高效且可重複使用的 MD 來應對這些挑戰。它在17個新設計的電池的「盲測」中表現出色,實現了1.02%的平均絕對誤差(MAE)。值得注意的是,十種染料的預測誤差在 1% 以內。這些結果驗證了機器學習模型及其在探索鋅卟啉的未知化學空間中的重要性。 SHAP 分析確定了與實驗觀察結果密切相關的關鍵 MD,為 DSC 中染料的合理設計提供了寶貴的化學指導。這些模型可實現高效預測,顯著縮短光伏電池的分析時間。具有優異 PCE 的有前景的鋅卟啉染料已被鑑定出來,有助於高通量虛擬篩選。此預測工具可透過 https://ai-meta.chem.ncu.edu.tw/dsc-meta 公開存取。 ;In recent years, the rapid global development has significantly increased the demand for alternative energy sources, particularly solar energy. The new generation of photovoltaic technologies such as Organic Solar Cells (OSC) and Dye-Sensitized Solar Cells (DSC) have attracted attention due to their low cost and diverse material options. Particularly important is the reduction of environmentally harmful heavy metals, such as lead in perovskite solar cells. In this study, we developed four predictive models based on machine learning, utilizing tree-based XGBoost and Artificial Neural Networks (ANN) techniques. These models employ molecular descriptors (MDs) derived from experimental and DFT calculations, to perform high throughput virtual screening (HTVS) of ternary OSC materials. The HTVS analysis utilized two distinct databases: the first comprised 429,413 unique ternary OSC systems reconstructed from an existing database; the second was drawn from the Harvard Clean Energy Project Database (CEPDB), which includes about 2.3 million unique donor material molecules. These four ML models demonstrated significant power conversion efficiency (PCE) prediction accuracy on closely related molecular test sets (interpolation). However, the XGBoost model showed limited capability in predicting molecules significantly different from those in the training set. Conversely, the ANN model exhibited strong extrapolative ability in HTVS, successfully predicting new potential ternary OSC systems with over 20% PCE. This study, through efficient HTVS, has accelerated the development of OSC molecular materials and advanced ternary OSC technology. On the other hand, a precise, predictive, and interpretable machine learning model specifically designed for Zn-porphyrin-sensitized solar cells was proposed. This model uses theoretically computable, efficient, and reusable MDs to address these challenges. It performed excellently in the "blind test" of 17 newly designed cells, achieving an average absolute error (MAE) of 1.02%. Notably, the predictive error for ten types of dyes was within 1%. These results validate the machine learning models and their importance in exploring the unknown chemical space of Zn-porphyrins. SHAP analysis identified key MDs that closely correspond with experimental observations, providing valuable chemical guidance for the rational design of dyes in DSCs. These models enable efficient predictive, significantly reducing the analysis time for photovoltaic cells. Promising Zn-porphyrin dyes with excellent PCE have been identified, facilitating high-throughput virtual screening. This predictive tool is publicly accessible at https://ai-meta.chem.ncu.edu.tw/dsc-meta. |