An application of subagging for the improvement of prediction accuracy of multivariate calibration models

R K H Galvao, M C U Araujo, M D Martins, Gledson Emidio José, M J C Pontes, E C Silva, T C B Saldanha

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

The term bagging refers to a class of techniques in which an ensemble model is obtained by combining different member models generated by resampling the available data set. It has been shown that bagging can lead to substantial gains in accuracy for both classification and regression models, specially when alterations in the training set cause significant changes in the outcome of the modelling procedure. However, in the context of chemometrics, the use of bagging for quantitative multicomponent analysis is still incipient. More recently, an alternative aggregation scheme termed subagging, which is based on subsampling without replacement, has been shown to provide performance improvements similar to bagging at a smaller computational cost. The present paper proposes a strategy for using subagging in conjunction with three multivariate calibration methods, namely Partial Least Squares (PLS) and Multiple Linear Regression with variable selection by using either the Successive Projections Algorithm (MLR-SPA) or a Genetic Algorithm (MLR-GA). The subagging member models are generated by subsampling the pool of samples available for modelling and then forming new calibration sets. Such a strategy is of value in analytical problems involving complex matrices, in which reproducing the composition variability of real samples by means of optimized experimental designs may be a difficult task. The efficiency of the proposed strategy is illustrated in a problem involving the NIR spectrometric determination of four diesel quality parameters (specific mass, sulphur content, and the distillation temperatures T10% and T90% at which 10% and 90% of the sample has evaporated, respectively). In this case study, the use of 30 subsampling iterations provides relative improvements of up to 16%, 33%, and 35% in the prediction accuracy of PLS, MLR-SPA, and MLR-GA models, respectively, with respect to the expected results of individual (non-ensemble) models. (c) 2005 Elsevier B.V. All rights reserved.

Original languageEnglish
Pages (from-to)60-67
Number of pages8
JournalChemometrics and intelligent laboratory systems
Volume81
Issue number1
DOIs
Publication statusPublished - 15 Mar 2006

Fingerprint

Calibration
Chemical analysis
Sulfur
Linear regression
Distillation
Design of experiments
Agglomeration
Genetic algorithms
Costs
Temperature

Keywords

  • bagging
  • subagging
  • MLR
  • PLS
  • SPA
  • genetic algorithms
  • NIR spectrometry
  • diesel analysis
  • successive projections algorithm
  • neural network ensembles
  • variable selection
  • spectrometry
  • QSAR

Cite this

Galvao, R. K. H., Araujo, M. C. U., Martins, M. D., José, G. E., Pontes, M. J. C., Silva, E. C., & Saldanha, T. C. B. (2006). An application of subagging for the improvement of prediction accuracy of multivariate calibration models. Chemometrics and intelligent laboratory systems, 81(1), 60-67. https://doi.org/10.1016/j.chemolab.2005.09.005
Galvao, R K H ; Araujo, M C U ; Martins, M D ; José, Gledson Emidio ; Pontes, M J C ; Silva, E C ; Saldanha, T C B . / An application of subagging for the improvement of prediction accuracy of multivariate calibration models. In: Chemometrics and intelligent laboratory systems. 2006 ; Vol. 81, No. 1. pp. 60-67.
@article{0bafe3ea97724f199c0081458f2cf96d,
title = "An application of subagging for the improvement of prediction accuracy of multivariate calibration models",
abstract = "The term bagging refers to a class of techniques in which an ensemble model is obtained by combining different member models generated by resampling the available data set. It has been shown that bagging can lead to substantial gains in accuracy for both classification and regression models, specially when alterations in the training set cause significant changes in the outcome of the modelling procedure. However, in the context of chemometrics, the use of bagging for quantitative multicomponent analysis is still incipient. More recently, an alternative aggregation scheme termed subagging, which is based on subsampling without replacement, has been shown to provide performance improvements similar to bagging at a smaller computational cost. The present paper proposes a strategy for using subagging in conjunction with three multivariate calibration methods, namely Partial Least Squares (PLS) and Multiple Linear Regression with variable selection by using either the Successive Projections Algorithm (MLR-SPA) or a Genetic Algorithm (MLR-GA). The subagging member models are generated by subsampling the pool of samples available for modelling and then forming new calibration sets. Such a strategy is of value in analytical problems involving complex matrices, in which reproducing the composition variability of real samples by means of optimized experimental designs may be a difficult task. The efficiency of the proposed strategy is illustrated in a problem involving the NIR spectrometric determination of four diesel quality parameters (specific mass, sulphur content, and the distillation temperatures T10{\%} and T90{\%} at which 10{\%} and 90{\%} of the sample has evaporated, respectively). In this case study, the use of 30 subsampling iterations provides relative improvements of up to 16{\%}, 33{\%}, and 35{\%} in the prediction accuracy of PLS, MLR-SPA, and MLR-GA models, respectively, with respect to the expected results of individual (non-ensemble) models. (c) 2005 Elsevier B.V. All rights reserved.",
keywords = "bagging, subagging, MLR, PLS, SPA, genetic algorithms, NIR spectrometry, diesel analysis, successive projections algorithm, neural network ensembles, variable selection, spectrometry, QSAR",
author = "Galvao, {R K H} and Araujo, {M C U} and Martins, {M D} and Jos{\'e}, {Gledson Emidio} and Pontes, {M J C} and Silva, {E C} and Saldanha, {T C B}",
year = "2006",
month = "3",
day = "15",
doi = "10.1016/j.chemolab.2005.09.005",
language = "English",
volume = "81",
pages = "60--67",
journal = "Chemometrics and intelligent laboratory systems",
issn = "0169-7439",
number = "1",

}

An application of subagging for the improvement of prediction accuracy of multivariate calibration models. / Galvao, R K H ; Araujo, M C U ; Martins, M D ; José, Gledson Emidio; Pontes, M J C ; Silva, E C ; Saldanha, T C B .

In: Chemometrics and intelligent laboratory systems, Vol. 81, No. 1, 15.03.2006, p. 60-67.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An application of subagging for the improvement of prediction accuracy of multivariate calibration models

AU - Galvao, R K H

AU - Araujo, M C U

AU - Martins, M D

AU - José, Gledson Emidio

AU - Pontes, M J C

AU - Silva, E C

AU - Saldanha, T C B

PY - 2006/3/15

Y1 - 2006/3/15

N2 - The term bagging refers to a class of techniques in which an ensemble model is obtained by combining different member models generated by resampling the available data set. It has been shown that bagging can lead to substantial gains in accuracy for both classification and regression models, specially when alterations in the training set cause significant changes in the outcome of the modelling procedure. However, in the context of chemometrics, the use of bagging for quantitative multicomponent analysis is still incipient. More recently, an alternative aggregation scheme termed subagging, which is based on subsampling without replacement, has been shown to provide performance improvements similar to bagging at a smaller computational cost. The present paper proposes a strategy for using subagging in conjunction with three multivariate calibration methods, namely Partial Least Squares (PLS) and Multiple Linear Regression with variable selection by using either the Successive Projections Algorithm (MLR-SPA) or a Genetic Algorithm (MLR-GA). The subagging member models are generated by subsampling the pool of samples available for modelling and then forming new calibration sets. Such a strategy is of value in analytical problems involving complex matrices, in which reproducing the composition variability of real samples by means of optimized experimental designs may be a difficult task. The efficiency of the proposed strategy is illustrated in a problem involving the NIR spectrometric determination of four diesel quality parameters (specific mass, sulphur content, and the distillation temperatures T10% and T90% at which 10% and 90% of the sample has evaporated, respectively). In this case study, the use of 30 subsampling iterations provides relative improvements of up to 16%, 33%, and 35% in the prediction accuracy of PLS, MLR-SPA, and MLR-GA models, respectively, with respect to the expected results of individual (non-ensemble) models. (c) 2005 Elsevier B.V. All rights reserved.

AB - The term bagging refers to a class of techniques in which an ensemble model is obtained by combining different member models generated by resampling the available data set. It has been shown that bagging can lead to substantial gains in accuracy for both classification and regression models, specially when alterations in the training set cause significant changes in the outcome of the modelling procedure. However, in the context of chemometrics, the use of bagging for quantitative multicomponent analysis is still incipient. More recently, an alternative aggregation scheme termed subagging, which is based on subsampling without replacement, has been shown to provide performance improvements similar to bagging at a smaller computational cost. The present paper proposes a strategy for using subagging in conjunction with three multivariate calibration methods, namely Partial Least Squares (PLS) and Multiple Linear Regression with variable selection by using either the Successive Projections Algorithm (MLR-SPA) or a Genetic Algorithm (MLR-GA). The subagging member models are generated by subsampling the pool of samples available for modelling and then forming new calibration sets. Such a strategy is of value in analytical problems involving complex matrices, in which reproducing the composition variability of real samples by means of optimized experimental designs may be a difficult task. The efficiency of the proposed strategy is illustrated in a problem involving the NIR spectrometric determination of four diesel quality parameters (specific mass, sulphur content, and the distillation temperatures T10% and T90% at which 10% and 90% of the sample has evaporated, respectively). In this case study, the use of 30 subsampling iterations provides relative improvements of up to 16%, 33%, and 35% in the prediction accuracy of PLS, MLR-SPA, and MLR-GA models, respectively, with respect to the expected results of individual (non-ensemble) models. (c) 2005 Elsevier B.V. All rights reserved.

KW - bagging

KW - subagging

KW - MLR

KW - PLS

KW - SPA

KW - genetic algorithms

KW - NIR spectrometry

KW - diesel analysis

KW - successive projections algorithm

KW - neural network ensembles

KW - variable selection

KW - spectrometry

KW - QSAR

U2 - 10.1016/j.chemolab.2005.09.005

DO - 10.1016/j.chemolab.2005.09.005

M3 - Article

VL - 81

SP - 60

EP - 67

JO - Chemometrics and intelligent laboratory systems

JF - Chemometrics and intelligent laboratory systems

SN - 0169-7439

IS - 1

ER -