Optimizing random forest for predicting thoracic surgery success in lung cancer using recursive feature elimination and gridsearchcv. Optimize Random Forest with RFE & GridSearchCV for predicting lung cancer thoracic surgery success. Achieves 91.41% accuracy using top 8 features.
Abstract. Lung cancer is one of the deadliest forms of cancer, claiming numerous lives annually. Thoracic surgery is a strategy to manage lung cancer patients; however, it poses high risks, including potential nerve damage and fatal complications leading to mortality. Predicting the success rate of thoracic surgery for lung cancer patients can be accomplished using data mining techniques based on classification principles. Medical data mining involves employing mathematical, statistical, and computational methods. In this study, the prediction of thoracic surgery success employs the random forest algorithm with recursive feature elimination for feature selection. The feature selection process yields the top 8 features. The 8 best features include 'DGN', 'PRE4', 'PRE5', 'PRE6', 'PRE10', 'PRE14', 'PRE30', and 'AGE'. Hyperparameter using GridSearchCV is then applied to enhance classification accuracy. The results of this method implementation demonstrate a predictive accuracy of 91.41%. Purpose: The study aims to develop and evaluate a Random Forest model with a Recursive Feature Elimination feature selection and applies hyperparameter GridSearchCV for predicting thoracic surgery success rate. Methods: This study uses the thoracic surgery dataset and applies various data preprocessing techniques. The dataset is then used for classification using the Random Forest algorithm and applies the Recursive Feature Elimination feature selection to obtain the best features. GridSearchCV is used in this study for hyperparameter. Result: The accuracy using the Random Forest algorithm and Recursive Feature Elimination feature selection with hyperparameters tuning GridSearchCV resulted in an accuracy of 91,41%. The accuracy was obtained from the following parameters values: bootstrap set to false, criterion set to gini, n_estimator equal to 100, max_depth set to none, min_samples_split equal to 4, min_samples_leaf equal to 1, max_features set to auto, n_jobs set to -1, and verbose set to 2 with 10-fold cross validation. Novelty: This study comparison and analysis of various dataset preprocessing methods and different model configurations are conducted to find the best model for predicting the success rate of thoracic surgery. The study also employs feature selection to choose the best feature to be used in classification process, as well as hyperparameter tuning to achieve optimal accuracy and discover the optimal values for these hyperparameters.
This paper addresses the critical and clinically relevant problem of predicting thoracic surgery success in lung cancer patients, a procedure fraught with significant risks. The authors propose an optimized Random Forest model as a solution, integrating Recursive Feature Elimination (RFE) for intelligent feature selection and GridSearchCV for robust hyperparameter tuning. Their methodology aims to leverage data mining techniques to improve patient outcomes by identifying those most likely to benefit from surgery. The study reports a promising predictive accuracy of 91.41%, obtained from an optimized model utilizing eight key features and specific hyperparameter settings. The methodological approach taken in this study demonstrates several strengths. The combination of Random Forest, a powerful ensemble learning algorithm known for its high predictive power and ability to handle complex, non-linear relationships, with RFE for feature selection, is a sound strategy. RFE helps in reducing dimensionality, improving model interpretability, and potentially mitigating overfitting by focusing on the most relevant predictors—such as 'AGE' and various 'PRE' codes. The subsequent application of GridSearchCV with 10-fold cross-validation is a rigorous approach to optimize model performance and ensure that the reported accuracy is not merely coincidental but reflects a robust and generalizable model. The detailed listing of the optimal hyperparameters further enhances the reproducibility of the study's findings. While the study presents a solid methodology and a commendable accuracy score, there are areas that could strengthen its impact and contribute to the broader scientific discourse. The claim of "novelty" regarding the comparison of various preprocessing methods and model configurations, while a good practice, is not fully substantiated by details within the abstract regarding *which* methods and configurations were compared beyond the final chosen one. A crucial omission is the lack of a comparative analysis with other established machine learning models (e.g., Logistic Regression, Support Vector Machines, XGBoost) or baseline methods, which would provide essential context for the reported 91.41% accuracy and affirm the superiority of the optimized Random Forest. Furthermore, for a medical application, beyond accuracy, metrics such as sensitivity, specificity, AUC, and F1-score are vital for a comprehensive understanding of the model's performance and clinical utility, particularly given the varying costs of false positives and false negatives in healthcare. Future work could also focus on providing more clinical interpretation of the 'PRE' features for better physician adoption and trust.
You need to be logged in to view the full text and Download file of this article - Optimizing Random Forest for Predicting Thoracic Surgery Success in Lung Cancer Using Recursive Feature Elimination and GridSearchCV from Recursive Journal of Informatics .
Login to View Full Text And DownloadYou need to be logged in to post a comment.
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria