Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm

Muhammad Fikri, Zaenal Abidin

Recursive Journal of Informatics

0.0 (0 ratings)

Introduction

Analysis of the use of nazief-adriani stemming and porter stemming in covid-19 twitter sentiment analysis with term frequency-inverse document frequency weighting based on k-nearest neighbor algorithm. Compares Nazief-Adriani and Porter stemming for COVID-19 Twitter sentiment analysis using TF-IDF and KNN. Evaluates accuracy for Indonesian-language data, finding Nazief-Adriani slightly more effective.

92 views

Abstract

Abstract. This system was developed to determine the accuracy of sentiment analysis on Twitter regarding the COVID-19 issue using the Nazief-Adriani and Porter stemmers with TF-IDF weighting, along with a classification process using K-Nearest Neighbor (KNN) that resulted in a comparison of 48.24% for Nazief-Adriani and 48.24% for Porter. Purpose: This research aims to determine the accuracy of the Nazief-Adriani and Porter stemmer algorithms in performing text preprocessing using a dataset from Indonesian-language Twitter. This research involves word weighting using TF-IDF and classification using the K-Nearest Neighbor (KNN) algorithm. Methods/Study design/approach: The experimentation was conducted by applying the Nazief-Adriani and Porter stemmer algorithm methods, utilizing data sourced from Twitter related to COVID-19. Subsequently, the data underwent text preprocessing, stemming, TF-IDF weighting, accuracy testing of training and testing data using K-Nearest Neighbor (KNN) algorithm, and the accuracy of both stemmers was calculated employing a confusion matrix table. Result/Findings: This study obtained reasonably accurate results in testing the Nazief-Adriani stemmer with an accuracy of 50.98%, applied to sentiment analysis of COVID-19-related Twitter data using the Indonesian language. As for the accuracy of the Porter stemmer, it achieved an accuracy rate of 48.24%. Novelty/Originality/Value: Feature selection is crucial in stemmer accuracy testing. Therefore, in this study, feature selection is carried out using the Nazief-Adriani and Porter stemmers for testing purposes, and the accuracy data classification is conducted using the K-Nearest Neighbor (KNN) algorithm

Review

This paper presents an analysis of Nazief-Adriani and Porter stemming algorithms' impact on COVID-19 Twitter sentiment analysis for Indonesian language text. The study employs a standard Natural Language Processing pipeline, utilizing TF-IDF for word weighting and the K-Nearest Neighbor (KNN) algorithm for classification. The topic is highly relevant and timely, addressing a real-world application of NLP to public discourse during a significant global event. The core contribution lies in the direct comparison of two prominent stemming techniques on a specific, challenging language dataset for sentiment classification. The methodology adopted is straightforward, involving data collection from Twitter, text preprocessing, application of the two distinct stemmers, TF-IDF vectorization, and subsequent classification using KNN, with accuracy measured via a confusion matrix. The stated results indicate that the Nazief-Adriani stemmer achieved an accuracy of 50.98%, while the Porter stemmer yielded 48.24%. However, a critical inconsistency is present in the abstract: the opening sentence states that *both* stemmers resulted in 48.24%, which directly contradicts the more detailed findings presented in the "Result/Findings" section. Regarding the "Novelty" claim, the abstract suggests that feature selection is crucial and is "carried out using the Nazief-Adriani and Porter stemmers." While stemming undoubtedly reduces the feature space, framing it directly as "feature selection" in this manner could be further elaborated to demonstrate a unique or innovative aspect beyond its conventional role in text normalization. While the study addresses a pertinent problem, the reported accuracy rates (around 50%) are notably low for sentiment analysis, suggesting that the model performs barely better than random chance for a binary classification task. This raises questions about the suitability of the chosen preprocessing steps, the efficacy of the stemmers for this particular dataset and language context, or the discriminative power of the KNN algorithm under these conditions. Future work should critically examine the reasons for these low accuracies, perhaps by exploring alternative classification algorithms (e.g., SVM, Naive Bayes, or deep learning models), more advanced text representation methods (e.g., word embeddings), or by incorporating additional robust preprocessing steps specific to noisy social media data and the Indonesian language. Furthermore, the inconsistency in the reported accuracy figures within the abstract must be rectified for clarity and credibility. Expanding evaluation metrics beyond just accuracy to include precision, recall, and F1-score would also provide a more comprehensive understanding of the model's performance, especially if class imbalance is a factor.

Full Text

You need to be logged in to view the full text and Download file of this article - Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm from Recursive Journal of Informatics .