Classifying vehicle categories based on technical specifications using random forest and smote for data augmentation. Classify vehicle categories like SUV, Sedan & Hybrid using Random Forest and SMOTE on technical specs. Explores machine learning challenges with imbalanced data for market segmentation insights.
This study investigates the application of machine learning for classifying vehicles based on their technical specifications using the Random Forest algorithm. The objective was to create a robust classification model capable of categorizing vehicles into six distinct classes: Hybrid, SUV, Sedan, Sports, Truck, and Wagon. The analysis was conducted using a comprehensive dataset that included features such as engine size, horsepower, weight, and fuel efficiency, along with the target variable, vehicle class. To address the issue of class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to balance the training data. The results showed that the model performed particularly well in classifying Sedans, achieving a perfect recall and high F1-score, while struggling with underrepresented classes like Hybrid and Wagon. Despite applying SMOTE, the model’s performance for minority classes remained suboptimal, highlighting the challenges associated with highly imbalanced datasets. The study contributes to the field of vehicle classification by demonstrating the use of Random Forest for such tasks and providing insights into the challenges posed by imbalanced class distributions. The findings underscore the importance of feature selection, especially regarding numerical attributes such as horsepower and engine size, in improving classification accuracy. However, the study also identified limitations, including potential biases in the dataset and the difficulty in improving performance for minority vehicle classes. Future research should explore alternative algorithms like XGBoost or deep learning models, and consider expanding the dataset to include more diverse vehicle types. The practical implications of this work are significant for vehicle market segmentation, offering valuable insights for manufacturers, dealerships, and analysts seeking to optimize vehicle classification and improve market targeting strategies.
This study presents an investigation into classifying vehicle categories using technical specifications, employing the Random Forest algorithm. The objective to categorize vehicles into six distinct classes is clear and relevant, particularly for vehicle market segmentation. The use of a comprehensive dataset featuring attributes like engine size, horsepower, weight, and fuel efficiency provides a solid foundation for the analysis. While the research successfully demonstrates the applicability of machine learning for this task, achieving excellent performance for prominent classes like Sedans, it also transparently identifies significant challenges in classifying underrepresented vehicle types, even after applying SMOTE for data augmentation. A notable strength of this work lies in its straightforward and reproducible methodology, clearly outlining the algorithm and the features used. The acknowledgment of class imbalance and the proactive attempt to mitigate it using SMOTE are commendable. The explicit discussion of the model's varying performance across different classes, particularly the perfect recall for Sedans versus the struggles with Hybrid and Wagon categories, provides valuable insights into the complexities of real-world datasets. Furthermore, the paper effectively articulates the practical implications for manufacturers and market analysts, underscoring the potential utility of such classification models in optimizing market targeting strategies. The identification of critical features like horsepower and engine size also contributes practical guidance for future research. However, the primary limitation highlighted, concerning the suboptimal performance for minority classes despite SMOTE, warrants further critical consideration. This suggests that the chosen data augmentation technique might not have been sufficiently effective for the specific nature or severity of the imbalance, or that the feature space for these classes is inherently less discriminative. While the authors suggest exploring alternative algorithms such as XGBoost or deep learning, and expanding the dataset, future work could also delve deeper into more advanced imbalance handling techniques, such as cost-sensitive learning, different sampling strategies, or more sophisticated ensemble methods specifically designed for imbalanced classification. A more thorough analysis of the specific reasons why SMOTE failed to significantly boost performance for minority classes would also enhance the study's contribution.
You need to be logged in to view the full text and Download file of this article - Classifying Vehicle Categories Based on Technical Specifications Using Random Forest and SMOTE for Data Augmentation from International Journal for Applied Information Management .
Login to View Full Text And DownloadYou need to be logged in to post a comment.
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria
By Sciaria