Analisis Perbandingan Metode Random Forest, XGBoost, dan Logistic Regression Untuk Klasifikasi Deteksi Dini Penyakit Diabetes
Home Research Details
Novriansyah Afqi Nur Akmal Fauzi, Fikri Budiman

Analisis Perbandingan Metode Random Forest, XGBoost, dan Logistic Regression Untuk Klasifikasi Deteksi Dini Penyakit Diabetes

0.0 (0 ratings)

Introduction

Analisis perbandingan metode random forest, xgboost, dan logistic regression untuk klasifikasi deteksi dini penyakit diabetes . Studi ini membandingkan algoritma Random Forest, XGBoost, dan Logistic Regression untuk deteksi dini diabetes. XGBoost unggul dengan akurasi 96.88% dalam klasifikasi risiko diabetes.

0
23 views

Abstract

Diabetes Mellitus is a chronic disease with a continuously increasing prevalence, posing serious challenges to public health and contributing significantly to the global economic burden. The often non-specific nature of early symptoms increases the risk of delayed diagnosis, highlighting the need for accurate early detection approaches to support clinical decision-making. This study aims to analyze and compare the performance of three machine learning algorithms Logistic Regression, Random Forest, and XGBoost in classifying diabetes risk based on several clinical parameters, including age, body mass index (BMI), blood pressure, glucose level, and HbA1c. The dataset used in this research was obtained from the Diabetes Prediction Dataset, consisting of 100,000 records. The research process involved handling missing data, applying One-Hot Encoding to categorical variables, normalizing numerical features, and addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE). Model performance was evaluated using Accuracy, Precision, Recall, F1-Score, and ROC-AUC metrics to provide a comprehensive assessment. The experimental results indicate that XGBoost achieved the best performance, with an accuracy of 96.88% and a ROC-AUC value of 98.00%. Meanwhile, Random Forest attained an accuracy of 95.68% with an F1-Score of 74.76%, while Logistic Regression recorded an accuracy of 88.96% and the highest recall value of 89.12%. These findings suggest that ensemble learning methods, particularly boosting approaches, are more effective in improving the accuracy of diabetes and non-diabetes classification. The primary contribution of this study lies in providing a multi-metric comparative analysis that can serve as a reference for selecting the most effective machine learning model in the development of medical decision support systems for early diabetes detection.


Review

This study addresses a critically important public health challenge: the escalating prevalence of Diabetes Mellitus and the need for accurate early detection to mitigate the risks associated with delayed diagnosis. The authors set out to comprehensively compare the performance of three prominent machine learning algorithms—Logistic Regression, Random Forest, and XGBoost—for classifying diabetes risk using a substantial dataset of 100,000 records encompassing key clinical parameters. The methodology employed is robust, including essential data preprocessing steps such as handling missing values, one-hot encoding for categorical variables, feature normalization, and crucially, addressing class imbalance through SMOTE, ensuring a fair and reliable evaluation of the models. The experimental results offer clear insights into the efficacy of the compared models. XGBoost emerged as the superior performer, achieving an impressive accuracy of 96.88% and a high ROC-AUC value of 98.00%, underscoring its predictive power for early diabetes detection. Random Forest also demonstrated strong performance with an accuracy of 95.68%, while Logistic Regression, though having the highest recall, showed comparatively lower overall accuracy at 88.96%. These findings strongly support the authors' conclusion that ensemble learning methods, particularly boosting techniques like XGBoost, are more effective in enhancing the accuracy of diabetes classification compared to traditional methods. The primary contribution of this research lies in its rigorous multi-metric comparative analysis, which provides a valuable reference for the selection of optimal machine learning models in clinical settings. By detailing the strengths of each algorithm across various evaluation metrics, the study offers practical guidance for developers of medical decision support systems aimed at early diabetes detection. This work is a significant step towards leveraging advanced analytics to improve clinical decision-making, potentially leading to earlier interventions and better patient outcomes in the global fight against diabetes.


Full Text

You need to be logged in to view the full text and Download file of this article - Analisis Perbandingan Metode Random Forest, XGBoost, dan Logistic Regression Untuk Klasifikasi Deteksi Dini Penyakit Diabetes from JURNAL RISET KOMPUTER (JURIKOM) .

Login to View Full Text And Download

Comments


You need to be logged in to post a comment.