Abstract: |
Heart disease remains a significant global health challenge, necessitating accurate and reliable classification techniques for early detection and diagnosis. Choosing a suitable classifier model for a dataset containing missing data is a pervasive issue in medical datasets, which can severely impact the performance of classification models. In this work, we present a comparative analysis of three ensemble techniques (i.e. Random Forest (RF), Extreme Gradient Boosting (XGB), and Bagging) and three single technique (i.e. K-nearest neighbor (KNN), Multilayer Perceptron (MLP), and Support Vector Machine (SVM)) applied to four heart disease medical datasets (i.e. Hungarian, Cleveland, Statlog and HeartDisease). The main objective of this study is to compare the performance of ensemble and single classifiers in handling incomplete heart disease datasets using KNN imputation and identify an effective approach for heart disease classification. We found that, overall, MLP outperformed SVM and KNN across datasets. Moreover, we found that ensemble techniques consistently outperformed the single techniques across multiple metrics and datasets. The ensemble models consistently achieved higher accuracy, precision, recall, F1 score, and AUC values. Therefore, for heart disease classification using KNN imputation, the ensemble techniques, particularly RF, Bagging, and XGB, proved to be the most effective models. |