Pre-processing is an essential step in the analysis of spectral data. Mid-IR spectroscopy of biological samples is often subject to instrumental and sample specific variances which may often conceal valuable biological information. Whilst pre-processing can effectively reduce this unwanted variance, the plethora of possible processing steps has resulted in a lack of consensus in the field, often meaning that analysis outputs are not comparable. As pre-processing is specific to the sample under investigation, here we present a systematic approach for defining the optimum pre-processing protocol for biofluid ATR-FTIR spectroscopy. Using a trial-and-error based approach and a clinically relevant dataset describing control and brain cancer patients (3,897 spectra), the effects of pre-processing permutations on subsequent classification algorithms were observed, by assessing key diagnostic performance parameters, including sensitivity and specificity. It was found that optimum diagnostic performance correlated with the use of minimal binning and baseline correction, with derivative functions improving diagnostic performance most significantly. If smoothing is required, a Sovitzky-Golay approach was the preferred option in this investigation. Heavy binning appeared to reduce classification most significantly, alongside wavelet noise reduction (filter length ≥ 6), resulting in the lowest diagnostic performances of all pre-processing permutations tested.
- Mid-IR spectroscopy
- pre-processing protocol
- biofluid ATR-FTIR spectroscopy