Implementation of synthetic minority oversampling technique and two-phase mutation grey wolf optimization on early diagnosis of diabetes using k-nearest neighbors. Boost early diabetes diagnosis accuracy to 98.85% using a KNN machine learning model. Employs SMOTE for data balancing & TMGWO for efficient feature selection.
Abstract. Diabetes is a disease attacking the endocrine system characterized by high blood sugar levels. International Diabetes Federation (IDF) estimates that there were 451 million people with diabetes globally in 2017. Without treatment, this number is expected to rise to 693 million by 2045. One method for preventing increases in the number of diabetics is by early diagnosis. In an era where technology has developed rapidly, early diagnosis can be made with the machine learning method using classification. In this study, we propose a diabetes classification using K-Nearest Neighbors (KNN). Before classifying the data, we select the best feature subset from the dataset using Two-phase Mutation Grey Wolf Optimization (TMGWO) and balance the training data using Synthetic Minority Oversampling Technique (SMOTE). After dividing the dataset into training and testing sets using 10-fold cross validation, we reached an accuracy of 98.85% using the proposed method. Purpose: This study aims to understand how to apply TMGWO and SMOTE to classify the early stage diabetes risk prediction dataset using KNN and how it affects the results. Methods/Study design/approach: In this study, we use TMGWO to make a feature selection on the dataset, K-fold cross validation to split the dataset into training and testing sets, SMOTE to balance the training data, and KNN to perform the classification. The desired results in this study are accuracy, precision, recall, and f1-score. Result/Findings: Performing classification using KNN with only features selected by TMGWO and balancing the training data using SMOTE gives an accuracy rate of 98.85%. From the results of this research, it can be concluded that the proposed algorithm can give higher accuracy compared to previous studies. Novelty/Originality/Value: Implementing TMGWO to perform feature selection so the model can perform classification with fewer features and implementing SMOTE to balance the training data so the model can better classify the minority class. By doing classification using fewer features, the model can perform classification with a shorter computational time compared to using all features in the dataset.
The paper addresses the critically important and timely issue of early diabetes diagnosis, a disease with rapidly increasing global prevalence. Given the substantial health and economic burden of diabetes, timely identification through advanced computational methods, particularly machine learning, offers a promising avenue for intervention and management. The proposed study leverages K-Nearest Neighbors (KNN) for classification, a widely used and interpretable algorithm, aiming to improve its performance through sophisticated preprocessing steps tailored for medical diagnostic tasks. This focus on early diagnosis aligns well with current public health strategies, making the research highly relevant and potentially impactful. Methodologically, the study introduces a hybrid approach involving Synthetic Minority Oversampling Technique (SMOTE) for addressing class imbalance and Two-phase Mutation Grey Wolf Optimization (TMGWO) for optimal feature selection prior to KNN classification. The use of SMOTE is a standard and appropriate technique for handling skewed datasets common in medical diagnosis, while TMGWO presents a more novel application for feature subset selection, aiming to reduce dimensionality and computational load. The dataset is subjected to a robust 10-fold cross-validation, a good practice for ensuring the generalizability of the model. The abstract clearly outlines the sequence of operations, from feature selection and data balancing to classification and evaluation using standard metrics such as accuracy, precision, recall, and F1-score. The reported accuracy of 98.85% is exceptionally high, suggesting a potentially very effective model for early diabetes diagnosis. The authors highlight the novelty of implementing TMGWO for feature selection to reduce the number of features and computational time, coupled with SMOTE to enhance the classification of the minority class. This combination aims to deliver a more efficient and accurate diagnostic tool. While the abstract claims superior accuracy compared to previous studies, this would require a comprehensive comparative analysis in the full paper to substantiate. The value proposition of reduced features and computational time is significant, particularly for real-time diagnostic systems, making this research a valuable contribution to the field of intelligent healthcare.
You need to be logged in to view the full text and Download file of this article - Implementation of Synthetic Minority Oversampling Technique and Two-phase Mutation Grey Wolf Optimization on Early Diagnosis of Diabetes using K-Nearest Neighbors from Recursive Journal of Informatics .
Login to View Full Text And DownloadYou need to be logged in to post a comment.
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria