A mixed solution-based high agreement filtering method for class noise detection in binary classification
Authors:
- Maryam Samami,
- Ebrahim Akbari,
- Moloud Abdar,
- Pawel Plawiak,
- Hossein Nematzadeh,
- Mohammad Ehsan Basiri,
- Vladimir Makarenkov
Abstract
Classification of noisy data has been a longstanding topic in data mining and machine learning. Many scientists have proposed effective methods to detect and eliminate such data in diverse real-world datasets. In this paper, we deal with mislabeled instances in supervised learning, including majority voting filtering and consensus voting filtering. The majority voting procedure usually incorrectly identifies many correct instances as noisy, whereas the consensus voting procedure is not able to detect at all many noisy instances. Our new method minimizes the majority and consensus filtering weaknesses by providing a novel class noise detection strategy, namely a high agreement voting filtering with mixed strategy, which proceeds by removing strong and semi-strong noisy records from the dataset as well as by relabeling weak noisy data. The proposed method, designed for binary classification problems, outperforms the high agreement voting filtering procedure. Extensive experiments conducted with 16 real datasets, using four noise filtering methods with two levels of class noise (10% and 15%), prove the superiority of the proposed methodology.
- Record ID
- CUT66e117e2a73843b2a6e3befdbadd061f
- Publication categories
- ;
- Author
- Journal series
- Physica A-Statistical Mechanics and Its Applications, ISSN 0378-4371, e-ISSN 1873-2119
- Issue year
- 2020
- Vol
- 553
- Pages
- [1-28]
- Article number
- 124219
- Other elements of collation
- il. (w tym kolor.); Bibliografia (na s.) - 27-28; Bibliografia (liczba pozycji) - 56; Oznaczenie streszczenia - Abstr.; Numeracja w czasopiśmie - Vol. 553
- Keywords in English
- data mining, high agreement voting filtering, classification, removing, relabeling, class noise detection
- DOI
- DOI:10.1016/j.physa.2020.124219 Opening in a new tab
- URL
- https://www.sciencedirect.com/science/article/abs/pii/S0378437120300492 Opening in a new tab
- Language
- eng (en) English
- Score (nominal)
- 70
- Publication indicators
- Citation count
- 22
- Additional fields
- Indeksowana w: Web of Science, Scopus
- Uniform Resource Identifier
- https://cris.pk.edu.pl/info/article/CUT66e117e2a73843b2a6e3befdbadd061f/
- URN
urn:pkr-prod:CUT66e117e2a73843b2a6e3befdbadd061f
* presented citation count is obtained through Internet information analysis, and it is close to the number calculated by the Publish or PerishOpening in a new tab system.