Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology

Benjamin Richard Smith, Katherine M. Ashton, Andrew Brodbelt, Timothy Dawson, Michael D. Jenkinson, Neil T. Hunt, David S. Palmer, Matthew J. Baker

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical tech- nique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy carries a degree of morbidity and mortality and on occasion may even be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm−1. To begin development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random Forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spec- tral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8% and specificities of up to 91.5% were possible. Fur- thermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete frequency collection, this approach is import to allow future clinical translation and enables IR to achieve its potential.
LanguageEnglish
Pages3668-3678
Number of pages11
JournalAnalyst
Volume141
Early online date19 Jan 2016
DOIs
Publication statusPublished - 7 Jun 2016

Fingerprint

Oncology
serum
cancer
Learning systems
brain
Brain
Biopsy
Serum
Fourier transform
Fourier Analysis
Fourier transforms
Brain Neoplasms
Infrared radiation
morbidity
FTIR spectroscopy
Fourier transform infrared spectroscopy
import
aid
Fourier Transform Infrared Spectroscopy
analytical method

Keywords

  • Fourier transform infrared (FTIR) spectroscopy
  • serum samples
  • FTIR
  • machine learning techniques
  • random forest machine learning
  • 2D correlational analysis
  • ratiometrics
  • rapid discrete frequency collection
  • IR technology
  • IR

Cite this

Smith, Benjamin Richard ; Ashton, Katherine M. ; Brodbelt, Andrew ; Dawson, Timothy ; Jenkinson, Michael D. ; Hunt, Neil T. ; Palmer, David S. ; Baker, Matthew J. / Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology. In: Analyst. 2016 ; Vol. 141. pp. 3668-3678.
@article{3bfa4e6dbf6c4cdabf2860bf6088ff5d,
title = "Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology",
abstract = "Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical tech- nique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy carries a degree of morbidity and mortality and on occasion may even be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm−1. To begin development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random Forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spec- tral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8{\%} and specificities of up to 91.5{\%} were possible. Fur- thermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete frequency collection, this approach is import to allow future clinical translation and enables IR to achieve its potential.",
keywords = "Fourier transform infrared (FTIR) spectroscopy, serum samples, FTIR, machine learning techniques, random forest machine learning, 2D correlational analysis, ratiometrics, rapid discrete frequency collection, IR technology, IR",
author = "Smith, {Benjamin Richard} and Ashton, {Katherine M.} and Andrew Brodbelt and Timothy Dawson and Jenkinson, {Michael D.} and Hunt, {Neil T.} and Palmer, {David S.} and Baker, {Matthew J.}",
year = "2016",
month = "6",
day = "7",
doi = "10.1039/C5AN02452H",
language = "English",
volume = "141",
pages = "3668--3678",
journal = "Analyst",
issn = "0003-2654",

}

Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology. / Smith, Benjamin Richard; Ashton, Katherine M.; Brodbelt, Andrew; Dawson, Timothy; Jenkinson, Michael D.; Hunt, Neil T.; Palmer, David S.; Baker, Matthew J.

In: Analyst, Vol. 141, 07.06.2016, p. 3668-3678.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology

AU - Smith, Benjamin Richard

AU - Ashton, Katherine M.

AU - Brodbelt, Andrew

AU - Dawson, Timothy

AU - Jenkinson, Michael D.

AU - Hunt, Neil T.

AU - Palmer, David S.

AU - Baker, Matthew J.

PY - 2016/6/7

Y1 - 2016/6/7

N2 - Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical tech- nique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy carries a degree of morbidity and mortality and on occasion may even be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm−1. To begin development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random Forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spec- tral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8% and specificities of up to 91.5% were possible. Fur- thermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete frequency collection, this approach is import to allow future clinical translation and enables IR to achieve its potential.

AB - Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical tech- nique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy carries a degree of morbidity and mortality and on occasion may even be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm−1. To begin development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random Forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spec- tral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8% and specificities of up to 91.5% were possible. Fur- thermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete frequency collection, this approach is import to allow future clinical translation and enables IR to achieve its potential.

KW - Fourier transform infrared (FTIR) spectroscopy

KW - serum samples

KW - FTIR

KW - machine learning techniques

KW - random forest machine learning

KW - 2D correlational analysis

KW - ratiometrics

KW - rapid discrete frequency collection

KW - IR technology

KW - IR

UR - http://pubs.rsc.org/en/content/articlelanding/2016/an/c5an02452h#!divAbstract

U2 - 10.1039/C5AN02452H

DO - 10.1039/C5AN02452H

M3 - Article

VL - 141

SP - 3668

EP - 3678

JO - Analyst

T2 - Analyst

JF - Analyst

SN - 0003-2654

ER -