A novel oversampling and feature selection hybrid algorithm for imbalanced data classification

Fang Feng, Kuan-Ching Li, Erfu Yang, Qingguo Zhou, Lihong Han, Amir Hussain, Mingjiang Cai

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)
41 Downloads (Pure)

Abstract

Traditional approaches tend to cause classier bias in the imbalanced data set, resulting in poor classification performance for minority classes. In particular, there are many imbalanced data in financial fraud, network intrusion, and fault detection, where recognition rate of minority classes is pertinent than the classification performance of majority classes. Therefore, there is pressure on developing efficient algorithms to solve the class imbalance problem. To this end, this article presents a novel hybrid algorithm Negative Binary General (NBG), to improve the performance of imbalanced classifications by combining oversampling and a feature selection algorithm. A novel oversampling algorithm, Negative-positive Synthetic Minority Oversampling Technique (NPSMOTE), improves sample generation’s practicability while the Binary Ant Lion Optimizer (BALO) algorithm extracts the most significant features to improve the classification performance. Simulation experiments carried out using seven benchmark imbalanced data sets demonstrate that, the proposed NBG algorithm significantly outperforms the classification of imbalanced small-sample data sets compared to nine other existing and six recently published algorithms.
Original languageEnglish
Pages (from-to)3231–3267
Number of pages37
JournalMultimedia Tools and Applications
Volume82
Issue number3
Early online date24 Jun 2022
DOIs
Publication statusPublished - 31 Jan 2023

Keywords

  • Imbalanced data
  • Oversampling
  • Feature selection
  • General vector machine

Fingerprint

Dive into the research topics of 'A novel oversampling and feature selection hybrid algorithm for imbalanced data classification'. Together they form a unique fingerprint.

Cite this