A chemometric study of chromatograms of tea extracts by correlation optimization warping in conjunction with PCA, support vector machines and random forest data modeling

L. Zheng, D.G. Watson, B.F. Johnston, Rachael L. Clark, Ruangelie Edrada-Ebel, W. Elseheri

Research output: Contribution to journalArticle

47 Citations (Scopus)

Abstract

A reverse phase high performance liquid chromatography (HPLC) separation was established for profiling water soluble compounds in extracts from tea. Whole chromatograms were pre-processed by techniques including baseline correction, binning and normalisation. In addition, peak alignment by correction of retention time shifts was performed using correlation optimization warping (COW) producing a correlation score of 0.96. To extract the chemically relevant information from the data, a variety of chemometric approaches were employed. Principle component analysis (PCA) was used to group the tea samples according to their chromatographic differences. Three principal components (PCs) described 78% of the total variance after peak alignment (64% before) and analysis of the score and loading plots provided insight into the main chemical differences between the samples. Finally, PCA, support vector machines (SVMs) and random forest (RF) machine learning methods were evaluated comparatively on their ability to predict unknown tea samples using models constructed from a predetermined training set. The best predictions of identity were obtained by using RF.
Original languageEnglish
Pages (from-to)257-265
Number of pages9
JournalAnalytica Chimica Acta
Volume624
Issue number1-2
DOIs
Publication statusPublished - 29 May 2009

Fingerprint

tea
Tea
Support vector machines
Data structures
modeling
High performance liquid chromatography
Reverse-Phase Chromatography
Learning systems
liquid chromatography
High Pressure Liquid Chromatography
Water
prediction
support vector machine
analysis
Support Vector Machine
water
alignment

Keywords

  • tea
  • principle component analysis
  • warping
  • correlation optimization warping
  • support vector machines
  • random forest
  • prediction
  • pharmacology

Cite this

@article{f8fbfda52d1b40cdafecbce654af82c4,
title = "A chemometric study of chromatograms of tea extracts by correlation optimization warping in conjunction with PCA, support vector machines and random forest data modeling",
abstract = "A reverse phase high performance liquid chromatography (HPLC) separation was established for profiling water soluble compounds in extracts from tea. Whole chromatograms were pre-processed by techniques including baseline correction, binning and normalisation. In addition, peak alignment by correction of retention time shifts was performed using correlation optimization warping (COW) producing a correlation score of 0.96. To extract the chemically relevant information from the data, a variety of chemometric approaches were employed. Principle component analysis (PCA) was used to group the tea samples according to their chromatographic differences. Three principal components (PCs) described 78{\%} of the total variance after peak alignment (64{\%} before) and analysis of the score and loading plots provided insight into the main chemical differences between the samples. Finally, PCA, support vector machines (SVMs) and random forest (RF) machine learning methods were evaluated comparatively on their ability to predict unknown tea samples using models constructed from a predetermined training set. The best predictions of identity were obtained by using RF.",
keywords = "tea, principle component analysis, warping, correlation optimization warping, support vector machines, random forest, prediction, pharmacology",
author = "L. Zheng and D.G. Watson and B.F. Johnston and Clark, {Rachael L.} and Ruangelie Edrada-Ebel and W. Elseheri",
year = "2009",
month = "5",
day = "29",
doi = "10.1016/j.aca.2008.12.015",
language = "English",
volume = "624",
pages = "257--265",
journal = "Analytica Chimica Acta",
issn = "0003-2670",
number = "1-2",

}

TY - JOUR

T1 - A chemometric study of chromatograms of tea extracts by correlation optimization warping in conjunction with PCA, support vector machines and random forest data modeling

AU - Zheng, L.

AU - Watson, D.G.

AU - Johnston, B.F.

AU - Clark, Rachael L.

AU - Edrada-Ebel, Ruangelie

AU - Elseheri, W.

PY - 2009/5/29

Y1 - 2009/5/29

N2 - A reverse phase high performance liquid chromatography (HPLC) separation was established for profiling water soluble compounds in extracts from tea. Whole chromatograms were pre-processed by techniques including baseline correction, binning and normalisation. In addition, peak alignment by correction of retention time shifts was performed using correlation optimization warping (COW) producing a correlation score of 0.96. To extract the chemically relevant information from the data, a variety of chemometric approaches were employed. Principle component analysis (PCA) was used to group the tea samples according to their chromatographic differences. Three principal components (PCs) described 78% of the total variance after peak alignment (64% before) and analysis of the score and loading plots provided insight into the main chemical differences between the samples. Finally, PCA, support vector machines (SVMs) and random forest (RF) machine learning methods were evaluated comparatively on their ability to predict unknown tea samples using models constructed from a predetermined training set. The best predictions of identity were obtained by using RF.

AB - A reverse phase high performance liquid chromatography (HPLC) separation was established for profiling water soluble compounds in extracts from tea. Whole chromatograms were pre-processed by techniques including baseline correction, binning and normalisation. In addition, peak alignment by correction of retention time shifts was performed using correlation optimization warping (COW) producing a correlation score of 0.96. To extract the chemically relevant information from the data, a variety of chemometric approaches were employed. Principle component analysis (PCA) was used to group the tea samples according to their chromatographic differences. Three principal components (PCs) described 78% of the total variance after peak alignment (64% before) and analysis of the score and loading plots provided insight into the main chemical differences between the samples. Finally, PCA, support vector machines (SVMs) and random forest (RF) machine learning methods were evaluated comparatively on their ability to predict unknown tea samples using models constructed from a predetermined training set. The best predictions of identity were obtained by using RF.

KW - tea

KW - principle component analysis

KW - warping

KW - correlation optimization warping

KW - support vector machines

KW - random forest

KW - prediction

KW - pharmacology

U2 - 10.1016/j.aca.2008.12.015

DO - 10.1016/j.aca.2008.12.015

M3 - Article

VL - 624

SP - 257

EP - 265

JO - Analytica Chimica Acta

JF - Analytica Chimica Acta

SN - 0003-2670

IS - 1-2

ER -