### Abstract

Language | English |
---|---|

Journal | Annals of Applied Statistics |

Publication status | Accepted/In press - 24 Jan 2018 |

### Fingerprint

### Keywords

- chemometrics
- functional data analysis
- multivariate calibration
- nanotechnology
- sequential Monte Carlo

### Cite this

*Annals of Applied Statistics*.

}

*Annals of Applied Statistics*.

**Bayesian modelling and quantification of Raman spectroscopy.** / Moores, Matthew; Gracie, Kirsten; Carson, Jake; Faulds, Karen; Graham, Duncan; Girolami, Mark.

Research output: Contribution to journal › Article

TY - JOUR

T1 - Bayesian modelling and quantification of Raman spectroscopy

AU - Moores, Matthew

AU - Gracie, Kirsten

AU - Carson, Jake

AU - Faulds, Karen

AU - Graham, Duncan

AU - Girolami, Mark

PY - 2018/1/24

Y1 - 2018/1/24

N2 - Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.

AB - Raman spectroscopy can be used to identify molecules such as DNA by the characteristic scattering of light from a laser. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. To overcome this challenge, we introduce a sequential Monte Carlo (SMC) algorithm to separate each observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. This latent continuous representation accounts for differences in resolution between measurements. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. By incorporating this representation in a Bayesian hierarchical regression model, we can quantify the relationship between molecular concentration and peak intensity, thereby providing an improved estimate of the limit of detection, which is of major importance to analytical chemistry.

KW - chemometrics

KW - functional data analysis

KW - multivariate calibration

KW - nanotechnology

KW - sequential Monte Carlo

UR - https://arxiv.org/abs/1604.07299

M3 - Article

JO - Annals of Applied Statistics

T2 - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

ER -