Application of C4.5 Algorithm Using Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO) for Diabetes Prediction
Home Research Details
Dela Rista Damayanti, Aji Purwinarko

Application of C4.5 Algorithm Using Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO) for Diabetes Prediction

0.0 (0 ratings)

Introduction

Application of c4.5 algorithm using synthetic minority oversampling technique (smote) and particle swarm optimization (pso) for diabetes prediction. Improve diabetes prediction accuracy using a hybrid C4.5 algorithm with SMOTE for class imbalance and PSO for optimization. Achieves 82.5% accuracy, enhancing early detection.

0
1 views

Abstract

Abstract. Diabetes is the fourth or fifth leading cause of death in most developed countries and an epidemic in many developing countries. Early detection can be a preventive measure that uses a set of existing data to be processed through data mining with a classification process. Purpose: Investigate the efficacy of integrating the C4.5 algorithm with Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO) for improving the accuracy of diabetes prediction models. By employing SMOTE, the study aims to address the class imbalance issue inherent in diabetes datasets, which often contain significantly fewer instances of positive cases (diabetes) than negative cases (non-diabetes). Furthermore, by incorporating PSO, the research seeks to optimize the decision tree construction process within the C4.5 algorithm, enhancing its ability to discern complex patterns and relationships within the data. Methods/Study design/approach: This study proposes the use of the C4.5 classification algorithm by applying the synthetic minority oversampling technique (SMOTE) and particle swarm optimization (PSO) to overcome problems in the diabetes dataset, namely the Pima Indian Diabetes Database (PIDD). Result/Findings: From the research results, the accuracy obtained in applying the C4.5 algorithm without the preprocessing process is 75.97%, while the results of the SMOTE application of the C4.5 algorithm are 80%. Meanwhile, applying the C4.5 algorithm using SMOTE and PSO produces the highest accuracy, with 82.5%. This indicates an increase of 6.53% from the classification results using the C4.5 algorithm. Novelty/Originality/Value: This research contributes novelty by proposing a hybrid approach that combines the C4.5 decision tree algorithm with two advanced techniques, Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO), for the prediction of diabetes. While previous studies have explored the application of machine learning algorithms for diabetes prediction, few have examined the synergistic effects of integrating SMOTE and PSO with the C4.5 algorithm specifically.


Review

This paper addresses the critical issue of early diabetes prediction, a significant global health concern, by proposing an innovative hybrid machine learning approach. The authors aim to enhance the accuracy of diabetes prediction models by integrating the C4.5 classification algorithm with the Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO). This combination specifically targets two common challenges in medical datasets: class imbalance, particularly the scarcity of positive diabetes cases, and the need for optimized decision tree construction. The study demonstrates a noteworthy improvement in prediction accuracy, positioning this method as a promising avenue for more effective early detection. A key strength of this research lies in its synergistic application of established techniques to mitigate known challenges. The use of SMOTE effectively tackles the class imbalance inherent in datasets like the Pima Indian Diabetes Database (PIDD), ensuring that the model is not biased towards the majority class and can learn robustly from minority (diabetic) instances. Furthermore, the incorporation of PSO to optimize the C4.5 algorithm is a clever strategy to refine the decision-making process, allowing for the discernment of more intricate patterns within the data. The reported increase in accuracy to 82.5%—a 6.53% improvement over the baseline C4.5—underscores the efficacy of this hybrid approach and its potential to contribute significantly to early diagnosis efforts. The novelty, as highlighted by the authors, stems from this specific tripartite integration and its application to diabetes prediction. While the proposed methodology shows promising results, there are areas that could strengthen the overall impact and generalizability of the findings. The exclusive reliance on the Pima Indian Diabetes Database (PIDD), though common, warrants further validation on more diverse or larger datasets to confirm the model's robustness across different populations. Additionally, while accuracy is a vital metric, for imbalanced medical datasets, the abstract would benefit from mentioning other crucial performance indicators such as precision, recall, F1-score, or AUC-ROC curve, which provide a more comprehensive evaluation of the model's predictive power for both classes. Future work could also involve a comparative analysis against other state-of-the-art machine learning algorithms or hybrid approaches to further benchmark the superiority of the proposed method.


Full Text

You need to be logged in to view the full text and Download file of this article - Application of C4.5 Algorithm Using Synthetic Minority Oversampling Technique (SMOTE) and Particle Swarm Optimization (PSO) for Diabetes Prediction from Recursive Journal of Informatics .

Login to View Full Text And Download

Comments


You need to be logged in to post a comment.