Classification of Text Datasets of Public Complaints Against the Government on Social Media Using Logistic Regression

Penulis

  • Mariana Purba
  • Sri Dianing Asri
  • Vina Ayumi
  • Umniy Salamah
  • Lemi Iryani

DOI:

https://doi.org/10.36085/jsai.v7i1.6447

Abstrak

In the current technological era, one of the most widely used social media in interacting and providing opinions, complaints and suggestions is Twitter. In the field of government, tweets that contain opinions or complaints about an organization's services or programs can be used as feedback to improve service quality. This study focuses on the classification of tweets to distinguish tweets that are classified as complaints or non-complaints by applying machine learning algorithms, namely logistic regression (LR). The stages of this research include crawling and labeling the dataset, pre-processing, modeling using classifier logistic regression, and evaluating classifier performance. Stages in this research such as preprocessing, classification and evaluation are carried out using the Python programming language with the help of the scikit-learn library. Based on experimental results, the research model using the feature extraction CountVectorizer achieves better performance than TfidfVectorizer. Experiments using the feature extraction TfidfVectorizer achieved an accuracy of 92% (F1 score: 0.9181, precision: 0.9191 recall: 0.9181, kappa: 0.8363) while using CountVectorizer accuracy reached 94% (F1 score: 0.9355, precision: 0.9406, recall: 0.9356, kappa: 0.8715).

Diterbitkan

2024-01-31

Terbitan

Bagian

Articles
Abstrak viewed = 83 times