A Comparative Study of Generalized Linear Mixed Model and Mixed Effects Random Forest for Analyzing Data with Outliers
Home Research Details
Reza Arianti, Khairil Anwar Notodiputro, Yenni Angraini

A Comparative Study of Generalized Linear Mixed Model and Mixed Effects Random Forest for Analyzing Data with Outliers

0.0 (0 ratings)

Introduction

A comparative study of generalized linear mixed model and mixed effects random forest for analyzing data with outliers. Compares GLMM-NB & MERF with winsorization for analyzing hierarchical data with outliers. Finds MERF more accurate for Indonesian tobacco consumption data, aiding public health policy.

0
2 views

Abstract

This study compares MERF and GLMM-NB in analyzing hierarchical data and focusing on the role of residual outliers and the application of winsorization. A two-stage analytical pipeline was implemented: (1) winsorization to reduce extreme residual values, and (2) model training using MERF and GLMM-NB. The dataset comes from the 2021 National Socio-Economic Survey (Susenas) in West Java Province, measuring tobacco consumption intensity. Two statistical approaches are compared, MERF and GLMM with a Negative Binomial distribution (GLMM-NB). Models were trained under two conditions: without winsorization (WIN0) and with two-sided 5% winsorization (WIN5). Winsorization was applied to the training data, and the test data were adjusted using thresholds from the training set. Model performance was assessed using Root Mean Squared Error (RMSE) and the train-test ratio. Under WIN0, GLMM recorded an RMSE of 49.65 for training and 42.27 for testing, while MERF achieved 35.96 and 39.94, respectively. After WIN5, GLMM showed a larger error reduction, with RMSE values of 34.90 (train) and 30.20 (test), while MERF dropped to 26.63 (train) and 28.64 (test). These results indicate that MERF provides higher predictive accuracy, whereas GLMM benefits more from winsorization. Household expenditure, employment status, age, and gender consistently emerged as key variables linked to tobacco consumption intensity. This study is the first to compare MERF and GLMM-NB with winsorization using Indonesia’s hierarchical data. The analytical framework helps inform public health policies aligned with SDG 3: Good Health and Well-being, particularly in reducing tobacco-related health risks.



Full Text

You need to be logged in to view the full text and Download file of this article - A Comparative Study of Generalized Linear Mixed Model and Mixed Effects Random Forest for Analyzing Data with Outliers from Jurnal Teknik Informatika (Jutif) .

Login to View Full Text And Download

Comments


You need to be logged in to post a comment.