Bayesian modelling and quantification of Raman spectroscopy

Matthew Moores, Kirsten Gracie, Jake Carson, Karen Faulds, Duncan Graham, Mark Girolami

Research output: Contribution to journalArticle

Abstract

Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.
LanguageEnglish
JournalAnnals of Applied Statistics
Publication statusAccepted/In press - 24 Jan 2018

Fingerprint

Bayesian Modeling
Raman Spectrum Analysis
Raman Spectroscopy
Quantification
Raman spectroscopy
Baseline
Quantify
Molecules
Sequential Monte Carlo
Penalized Splines
Sequential Algorithm
Monte Carlo Algorithm
Cubic Spline
Hierarchical Model
White noise
Posterior distribution
Splines
Chemistry
Limit of Detection
Spectrum Analysis

Keywords

  • chemometrics
  • functional data analysis
  • multivariate calibration
  • nanotechnology
  • sequential Monte Carlo

Cite this

Moores, M., Gracie, K., Carson, J., Faulds, K., Graham, D., & Girolami, M. (Accepted/In press). Bayesian modelling and quantification of Raman spectroscopy. Annals of Applied Statistics.
Moores, Matthew ; Gracie, Kirsten ; Carson, Jake ; Faulds, Karen ; Graham, Duncan ; Girolami, Mark. / Bayesian modelling and quantification of Raman spectroscopy. In: Annals of Applied Statistics. 2018.
@article{2c592d64459b4b3786b3253bfde0a89c,
title = "Bayesian modelling and quantification of Raman spectroscopy",
abstract = "Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.",
keywords = "chemometrics, functional data analysis, multivariate calibration, nanotechnology, sequential Monte Carlo",
author = "Matthew Moores and Kirsten Gracie and Jake Carson and Karen Faulds and Duncan Graham and Mark Girolami",
year = "2018",
month = "1",
day = "24",
language = "English",
journal = "Annals of Applied Statistics",
issn = "1932-6157",

}

Bayesian modelling and quantification of Raman spectroscopy. / Moores, Matthew; Gracie, Kirsten; Carson, Jake; Faulds, Karen; Graham, Duncan; Girolami, Mark.

In: Annals of Applied Statistics, 24.01.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Bayesian modelling and quantification of Raman spectroscopy

AU - Moores, Matthew

AU - Gracie, Kirsten

AU - Carson, Jake

AU - Faulds, Karen

AU - Graham, Duncan

AU - Girolami, Mark

PY - 2018/1/24

Y1 - 2018/1/24

N2 - Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.

AB - Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.

KW - chemometrics

KW - functional data analysis

KW - multivariate calibration

KW - nanotechnology

KW - sequential Monte Carlo

UR - https://arxiv.org/abs/1604.07299

M3 - Article

JO - Annals of Applied Statistics

T2 - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

ER -