Abstract
Metabolomics data usually undergoes both pre-processing of the raw data and then further pre-treatment before any statistical analysis is carried out. Different pre-treatment methods emphasise various aspects of the data, and each method has advantages and disadvantages. The choice of pre-treatment method depends on the biological question of interest, characteristics of the data and the chosen data analysis. In this paper, we investigate the effects of different pre-treatment methods on four metabolomics data sets arising from chemical analysis of propolis samples collected from honey bee colonies in three different locations in Scotland, and also samples from Libya. Propolis has a variety of biological properties including anti-protozoal and anti-inflammatory effects. As a complex mixture, its biological activity depends on its exact composition, which can be investigated via metabolomic analysis. Two techniques of pre-treatment were applied, namely, transformation and scaling. The choice of method was found to greatly affect the results of the principal component analysis (PCA) used to explain the variation in the data. The results indicated that there was no notable (if any) improvement to be made by using any transformation techniques. It was also found for all four data sets that Pareto scaling, incorporating mean centring, performed better than the other scaling approaches considered here in terms of PCA, the analysis of interest, because the results explain more of the variation in the data.
Original language | English |
---|---|
Pages (from-to) | 13-34 |
Number of pages | 22 |
Journal | Advances and Applications in Statistics |
Volume | 58 |
Issue number | 1 |
DOIs | |
Publication status | Published - 30 Sept 2019 |
Keywords
- metabolomics data
- propolis
- pre-treatment
- principal component analysis (PCA)
- transformation
- centring
- standardisation
- vast scaling
- Pareto scaling
- range scaling
- level scaling