A novel model to optimize multiple imputation algorithm for missing data using evolution methods
Authors:
- Yasser Salaheldin Mohammed,
- Hatem Abdelkader,
- Paweł Pławiak,
- Mohamed Hammad
Abstract
The concept of missing data is considered significant when applying statistical methods to a dataset and the quality of the data analysis results is based on the correct data completeness. As a result, improving missing data filling processes is vital in order to give more reliable data throughout the phase of analysis. Here, we present a novel method for optimizing multiple regression imputation processes and obtaining the best fitness values for missing data from patients by combining multiple imputations with a genetic algorithm. To train and assess our proposed method, we employed 583 patient records from a publicly available database, divided into 416 records of liver patients and 167 records of the non-liver patients. The proposed approach offers the largest improvement for missing data findings, according to the results. Instead of employing the normal equation in multiple imputations, which yielded 92.72 as the utmost fitness value with Mean Absolute Error (MAE) 0.5877 from 1.1840 after our second optimization, we were able to achieve a fitness value of 233. The proposed approach might be tested using a large database and used in Hepatocellular carcinoma (HCC) labs to help clinicians make accurate diagnoses.
- Record ID
- CUT4f30e5f8268f4d268d8da4184e0e7641
- Publication categories
- ;
- Author
- Journal series
- Biomedical Signal Processing and Control, ISSN 1746-8094, e-ISSN 1746-8108
- Issue year
- 2022
- Vol
- 76
- Pages
- [1-10]
- Article number
- 103661
- Other elements of collation
- rys.; tab.; wykr.; Bibliografia (na s.) - 10; Bibliografia (liczba pozycji) - 28; Oznaczenie streszczenia - Abstr.; Numeracja w czasopiśmie - Vol. 76
- Keywords in English
- multiple imputation, fitness value, multiple regression, genetic algorithm, missing data, optimization
- ASJC Classification
- ;
- DOI
- DOI:10.1016/j.bspc.2022.103661 Opening in a new tab
- URL
- https://www.sciencedirect.com/science/article/pii/S1746809422001835 Opening in a new tab
- Language
- eng (en) English
- Score (nominal)
- 140
- Score source
- journalList
- Score
- Publication indicators
- Citation count
- 11
- Uniform Resource Identifier
- https://cris.pk.edu.pl/info/article/CUT4f30e5f8268f4d268d8da4184e0e7641/
- URN
urn:pkr-prod:CUT4f30e5f8268f4d268d8da4184e0e7641
* presented citation count is obtained through Internet information analysis, and it is close to the number calculated by the Publish or PerishOpening in a new tab system.