Abstract
Background: Feature selection techniques are important factors for improving machine learning models because they increase prediction accuracy and decrease the time to create a model. Recently, feature selection techniques have been employed on software quality prediction problems with different results and no clear indication of which techniques are frequently used.Objective: This study aims to conduct a systematic review of the application of feature selection techniques in software quality prediction and answers eight research questions.Method: The review evaluates 15 papers in 9 journals and 6 conference proceedings from 2007 to 2017 using the standard systematic literature review method.Results: The results obtained from this study reveal that the filter feature selection method was the most commonly used in the studies (60%) and RELIEF was the most employed among this method, and a limited number of studies employed an ensemble method. Several studies used public datasets available in the PROMISE software project repository (60%). Most studies focused on software defect prediction (classification problem) using area under curve (AUC) as a primary evaluation measure, whereas only two studies focused on software maintainability prediction (regression problem) using mean magnitude of relative error (MMRE) as a primary evaluation measure. All selected studies performed k-fold cross-validation to evaluate model accuracy. Individual prediction models were mostly employed and ensemble models appeared only in three studies. Naive Bayes was the most investigated among individual models, whereas Random forest was the most investigated among ensemble models.Conclusion: Feature selection techniques used by selected primary studies have a positive impact on the performance of the prediction models. Further, both ensemble feature selection method and ensemble models have the ability for increasing prediction accuracy over single methods or individual models and have reported improvement in the prediction accuracy; however, the application of these techniques in software quality prediction is still limited.
Original language | English |
---|---|
Title of host publication | 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA) |
Publisher | IEEE |
Pages | 1-5 |
Number of pages | 5 |
ISBN (Electronic) | 9781728155326 |
ISBN (Print) | 9781728155333 |
DOIs | |
Publication status | Published - 19 Nov 2019 |
Event | 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA) - Ras Al Khaimah, United Arab Emirates Duration: 19 Nov 2019 → 21 Nov 2019 |
Conference
Conference | 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA) |
---|---|
Country/Territory | United Arab Emirates |
City | Ras Al Khaimah |
Period | 19/11/19 → 21/11/19 |
Keywords
- systematic literature review
- feature selection
- software defect
- software maintainability
- prediction
- Bayes methods
- random forests
- software quality
- software quality prediction problems