Abstract
To understand hydrogen uptake in porous carbon materials, we developed machine learning models to predict excess uptake at 77 K based on the textural and chemical properties of carbon, using a dataset containing 68 different samples and 1745 data points. Random forest is selected due to its high performance (R2 > 0.9), and analysis is performed using Shapley Additive Explanations (SHAP). It is found that pressure and Brunauer-Emmett-Teller (BET) surface area are the two strongest predictors of excess hydrogen uptake. Surprisingly, this is followed by a positive correlation with oxygen content, contributing up to ∼0.6 wt% additional hydrogen uptake, contradicting the conclusions of previous studies. Finally, pore volume has the smallest effect. The pore size distribution is also found to be important, since ultramicropores (dp < 0.7 nm) are found to be more positively correlated with excess uptake than micropores (dp < 2 nm). However, this effect is quite small compared to the role of BET surface area and total pore volume. The novel approach taken here can provide important insights in the rational design of carbon materials for hydrogen storage applications.
Original language | English |
---|---|
Pages (from-to) | 190-201 |
Number of pages | 12 |
Journal | Carbon |
Volume | 179 |
Early online date | 20 Apr 2021 |
DOIs | |
Publication status | Published - 31 Jul 2021 |
Funding
This work was supported by JST COI Grant Number JPMJCE1318 (Japan), and a JSPS KAKENHI Grant-in-Aid for Scientific Research B , Grant Number 19H02558 (Japan). Five different models were evaluated for their predictive performance: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). To tune the hyperparameters of each model, we performed group 5-fold cross-validation using either the function GridSearchCV(), or RandomizedSearchCV() in scikit-learn (a free Python library for machine learning), with parameters specified in Table S1. Group 5-fold cross-validation is used here instead of regular cross validation to ensure that the models generalize well to unseen samples. The sample names are used as group labels so that in each fold, every test set will not contain data from carbon samples in its respective training set. If regular K-fold cross validation were used instead, where the test-training split are completely randomized, the model may only have needed to interpolate or complete an isotherm for a known carbon sample, rather than generate an entirely new isotherm for an unknown sample. The difference between the two cross-validation methods is summarized in Fig. 1. We tried five different machine learning models to predict excess hydrogen uptake based on the textural and chemical properties of the different carbon materials: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). The cross-validated performances of the different models are compared in Table 1. In addition, a comparison between the predicted and actual hydrogen uptake values for different models is shown in Fig. 3. Clearly, linear approximations are not well suited for this prediction task, since LR and SVR(L) performed significantly worse than the non-linear models. This result is to be expected for two reasons: first, the strong relationship between pressure and uptake is non-linear; second, linear models don't perform as well with multicollinear predictor variables [62]. Based on the performance metrics, the random forest (RF) regression method was selected due to its high performance (R2 > 0.9), and refit with the entire dataset. Notably, even with nested cross-validation, the difference in runtime is not too severe compared to the other models, taking only several minutes to run. RF also has the advantage of being robust against multicollinearities [28], which we have demonstrated to exist in this dataset.This work was supported by JST COI Grant Number JPMJCE1318 (Japan), and a JSPS KAKENHI Grant-in-Aid for Scientific Research B, Grant Number 19H02558 (Japan).
Keywords
- explainable AI
- hydrogen storage
- machine learning
- physisorption
- porous carbon
- shapley additive explanations