Class imbalance ensemble learning based on the margin theory

Wei Feng, Wenjiang Huang, Jinchang Ren

Research output: Contribution to journalArticle

Abstract

The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning.

LanguageEnglish
Article number815
Number of pages28
JournalApplied Sciences
Volume8
Issue number5
DOIs
StatePublished - 18 May 2018

Fingerprint

learning
margins
Classifiers
Sampling
classifiers
Learning systems
data sampling
Costs
machine learning
preprocessing
proposals
proportion
education
sampling
costs

Keywords

  • classification
  • ensemble learning
  • ensemble margin
  • imbalance learning
  • multi-class

Cite this

@article{2768a0d8e06d4f72ba9d4fae5fc93693,
title = "Class imbalance ensemble learning based on the margin theory",
abstract = "The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning.",
keywords = "classification, ensemble learning, ensemble margin, imbalance learning, multi-class",
author = "Wei Feng and Wenjiang Huang and Jinchang Ren",
year = "2018",
month = "5",
day = "18",
doi = "10.3390/app8050815",
language = "English",
volume = "8",
journal = "Applied Sciences",
issn = "2076-3417",
number = "5",

}

Class imbalance ensemble learning based on the margin theory. / Feng, Wei; Huang, Wenjiang; Ren, Jinchang.

In: Applied Sciences, Vol. 8, No. 5, 815, 18.05.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Class imbalance ensemble learning based on the margin theory

AU - Feng,Wei

AU - Huang,Wenjiang

AU - Ren,Jinchang

PY - 2018/5/18

Y1 - 2018/5/18

N2 - The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning.

AB - The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning.

KW - classification

KW - ensemble learning

KW - ensemble margin

KW - imbalance learning

KW - multi-class

UR - http://www.scopus.com/inward/record.url?scp=85047080680&partnerID=8YFLogxK

UR - http://www.mdpi.com/journal/applsci

U2 - 10.3390/app8050815

DO - 10.3390/app8050815

M3 - Article

VL - 8

JO - Applied Sciences

T2 - Applied Sciences

JF - Applied Sciences

SN - 2076-3417

IS - 5

M1 - 815

ER -