A strategy for selecting calibration samples for multivariate modelling

H A Dantas, R K H Galvao, M C U Araujo, E C da Silva, T C B Saldanha, G E Jose, C Pasquini, I M Raimundo, J J R Rohwedder

Research output: Contribution to journalArticle

61 Citations (Scopus)

Abstract

A sample selection strategy based on the Successive Projections Algorithm (SPA), which is a technique originally developed for variable selection, is proposed. The strategy selects a subset of samples that are minimally redundant but still representative of the data set. The selection takes into account both X and Y statistics, thus tailoring the choice of samples according to the spectral profiles of the chemical species involved in the analysis. Such procedure is of value to reduce the experimental and computational workload involved in the multivariate calibration, as well as in the transfer of calibration between different instruments. The strategy was applied to UV-VIS spectrometric simultaneous multicomponent analysis of complexes of Co2+, Cu2+, Mn2+, Ni2+ and Zn2+ with 4-(2-piridilazo)resorcinol and also to total sulphur determination in diesel by NIR spectrometry. The selection of samples was preceded by wavelength selection to avoid ill-conditioning problems in the multiple linear regression (MLR) modeling employed by SPA. In both applications, SPA reduced the number of variables and samples considerably, especially in the NIR data set, where it provided an impressive reduction in the number of wavelengths from 3071 to 10 and in the number of samples from 92 to 10. MLR models developed with the selected calibration samples displayed no significant loss of prediction ability when compared to MLR and PLS1 models built with the full set of calibration samples. This finding shows that the selected samples do convey the information needed for modeling. Moreover, in the NIR application, sample selection by SPA provided significantly better results than the classic Kennard-Stone (KS) algorithm. (C) 2004 Elsevier B.V. All rights reserved.

Original languageEnglish
Pages (from-to)83-91
Number of pages9
JournalChemometrics and intelligent laboratory systems
Volume72
Issue number1
DOIs
Publication statusPublished - 28 Jun 2004

Fingerprint

Calibration
Linear regression
Sulfur determination
Wavelength
Spectrometry
Statistics

Keywords

  • successive Projections algorithm
  • sample selection
  • UV-VIS and NIR spectrometry
  • diesel analysis
  • total sulphur determination
  • multivariate calibration
  • multicomponent analysis
  • genetic algorithms
  • spectra

Cite this

Dantas, H. A., Galvao, R. K. H., Araujo, M. C. U., da Silva, E. C., Saldanha, T. C. B., Jose, G. E., ... Rohwedder, J. J. R. (2004). A strategy for selecting calibration samples for multivariate modelling. Chemometrics and intelligent laboratory systems, 72(1), 83-91. https://doi.org/10.1016/j.chemolab.2004.02.008
Dantas, H A ; Galvao, R K H ; Araujo, M C U ; da Silva, E C ; Saldanha, T C B ; Jose, G E ; Pasquini, C ; Raimundo, I M ; Rohwedder, J J R . / A strategy for selecting calibration samples for multivariate modelling. In: Chemometrics and intelligent laboratory systems. 2004 ; Vol. 72, No. 1. pp. 83-91.
@article{d26ba006c495484490bc1c26fc0c81aa,
title = "A strategy for selecting calibration samples for multivariate modelling",
abstract = "A sample selection strategy based on the Successive Projections Algorithm (SPA), which is a technique originally developed for variable selection, is proposed. The strategy selects a subset of samples that are minimally redundant but still representative of the data set. The selection takes into account both X and Y statistics, thus tailoring the choice of samples according to the spectral profiles of the chemical species involved in the analysis. Such procedure is of value to reduce the experimental and computational workload involved in the multivariate calibration, as well as in the transfer of calibration between different instruments. The strategy was applied to UV-VIS spectrometric simultaneous multicomponent analysis of complexes of Co2+, Cu2+, Mn2+, Ni2+ and Zn2+ with 4-(2-piridilazo)resorcinol and also to total sulphur determination in diesel by NIR spectrometry. The selection of samples was preceded by wavelength selection to avoid ill-conditioning problems in the multiple linear regression (MLR) modeling employed by SPA. In both applications, SPA reduced the number of variables and samples considerably, especially in the NIR data set, where it provided an impressive reduction in the number of wavelengths from 3071 to 10 and in the number of samples from 92 to 10. MLR models developed with the selected calibration samples displayed no significant loss of prediction ability when compared to MLR and PLS1 models built with the full set of calibration samples. This finding shows that the selected samples do convey the information needed for modeling. Moreover, in the NIR application, sample selection by SPA provided significantly better results than the classic Kennard-Stone (KS) algorithm. (C) 2004 Elsevier B.V. All rights reserved.",
keywords = "successive Projections algorithm, sample selection, UV-VIS and NIR spectrometry, diesel analysis, total sulphur determination, multivariate calibration, multicomponent analysis, genetic algorithms, spectra",
author = "Dantas, {H A} and Galvao, {R K H} and Araujo, {M C U} and {da Silva}, {E C} and Saldanha, {T C B} and Jose, {G E} and C Pasquini and Raimundo, {I M} and Rohwedder, {J J R}",
year = "2004",
month = "6",
day = "28",
doi = "10.1016/j.chemolab.2004.02.008",
language = "English",
volume = "72",
pages = "83--91",
journal = "Chemometrics and intelligent laboratory systems",
issn = "0169-7439",
number = "1",

}

Dantas, HA, Galvao, RKH, Araujo, MCU, da Silva, EC, Saldanha, TCB, Jose, GE, Pasquini, C, Raimundo, IM & Rohwedder, JJR 2004, 'A strategy for selecting calibration samples for multivariate modelling', Chemometrics and intelligent laboratory systems, vol. 72, no. 1, pp. 83-91. https://doi.org/10.1016/j.chemolab.2004.02.008

A strategy for selecting calibration samples for multivariate modelling. / Dantas, H A ; Galvao, R K H ; Araujo, M C U ; da Silva, E C ; Saldanha, T C B ; Jose, G E ; Pasquini, C ; Raimundo, I M ; Rohwedder, J J R .

In: Chemometrics and intelligent laboratory systems, Vol. 72, No. 1, 28.06.2004, p. 83-91.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A strategy for selecting calibration samples for multivariate modelling

AU - Dantas, H A

AU - Galvao, R K H

AU - Araujo, M C U

AU - da Silva, E C

AU - Saldanha, T C B

AU - Jose, G E

AU - Pasquini, C

AU - Raimundo, I M

AU - Rohwedder, J J R

PY - 2004/6/28

Y1 - 2004/6/28

N2 - A sample selection strategy based on the Successive Projections Algorithm (SPA), which is a technique originally developed for variable selection, is proposed. The strategy selects a subset of samples that are minimally redundant but still representative of the data set. The selection takes into account both X and Y statistics, thus tailoring the choice of samples according to the spectral profiles of the chemical species involved in the analysis. Such procedure is of value to reduce the experimental and computational workload involved in the multivariate calibration, as well as in the transfer of calibration between different instruments. The strategy was applied to UV-VIS spectrometric simultaneous multicomponent analysis of complexes of Co2+, Cu2+, Mn2+, Ni2+ and Zn2+ with 4-(2-piridilazo)resorcinol and also to total sulphur determination in diesel by NIR spectrometry. The selection of samples was preceded by wavelength selection to avoid ill-conditioning problems in the multiple linear regression (MLR) modeling employed by SPA. In both applications, SPA reduced the number of variables and samples considerably, especially in the NIR data set, where it provided an impressive reduction in the number of wavelengths from 3071 to 10 and in the number of samples from 92 to 10. MLR models developed with the selected calibration samples displayed no significant loss of prediction ability when compared to MLR and PLS1 models built with the full set of calibration samples. This finding shows that the selected samples do convey the information needed for modeling. Moreover, in the NIR application, sample selection by SPA provided significantly better results than the classic Kennard-Stone (KS) algorithm. (C) 2004 Elsevier B.V. All rights reserved.

AB - A sample selection strategy based on the Successive Projections Algorithm (SPA), which is a technique originally developed for variable selection, is proposed. The strategy selects a subset of samples that are minimally redundant but still representative of the data set. The selection takes into account both X and Y statistics, thus tailoring the choice of samples according to the spectral profiles of the chemical species involved in the analysis. Such procedure is of value to reduce the experimental and computational workload involved in the multivariate calibration, as well as in the transfer of calibration between different instruments. The strategy was applied to UV-VIS spectrometric simultaneous multicomponent analysis of complexes of Co2+, Cu2+, Mn2+, Ni2+ and Zn2+ with 4-(2-piridilazo)resorcinol and also to total sulphur determination in diesel by NIR spectrometry. The selection of samples was preceded by wavelength selection to avoid ill-conditioning problems in the multiple linear regression (MLR) modeling employed by SPA. In both applications, SPA reduced the number of variables and samples considerably, especially in the NIR data set, where it provided an impressive reduction in the number of wavelengths from 3071 to 10 and in the number of samples from 92 to 10. MLR models developed with the selected calibration samples displayed no significant loss of prediction ability when compared to MLR and PLS1 models built with the full set of calibration samples. This finding shows that the selected samples do convey the information needed for modeling. Moreover, in the NIR application, sample selection by SPA provided significantly better results than the classic Kennard-Stone (KS) algorithm. (C) 2004 Elsevier B.V. All rights reserved.

KW - successive Projections algorithm

KW - sample selection

KW - UV-VIS and NIR spectrometry

KW - diesel analysis

KW - total sulphur determination

KW - multivariate calibration

KW - multicomponent analysis

KW - genetic algorithms

KW - spectra

U2 - 10.1016/j.chemolab.2004.02.008

DO - 10.1016/j.chemolab.2004.02.008

M3 - Article

VL - 72

SP - 83

EP - 91

JO - Chemometrics and intelligent laboratory systems

JF - Chemometrics and intelligent laboratory systems

SN - 0169-7439

IS - 1

ER -