Topic Modeling Based on Computer Science Research Documents Using Text Mining

Authors

  • Bakhtiar Bakhtiar Universitas Sjakhyakirti, Palembang, Indonesia
  • Azhar Andika Putra Universitas Sjakhyakirti, Palembang, Indonesia
  • Muhammad Al Hapiz Universitas Sjakhyakirti, Palembang, Indonesia
  • Firga Abel Astiawan Universitas Sjakhyakirti, Palembang, Indonesia

DOI:

https://doi.org/10.36085/jsai.v8i3.9387

Abstract

This study aimed to develop a document clustering model using a combination of the IndoBERT model and the K-Means algorithm to group research abstracts in the field of computer science and technology. The data used consisted of 1000 research abstracts, divided into two parts: 80% for training data (800 abstracts) and 20% for testing data (200 abstracts). The IndoBERT model was used to represent the abstracts as embedding vectors, which were then processed with the K-Means algorithm to form 10 topic clusters, including artificial intelligence, computer systems and networks, programming, cybersecurity, and others. The training experiment used the training data to generate clusters and centroids for mapping new documents into the appropriate clusters. Evaluation was carried out using several metrics, including accuracy, cluster homogeneity, Davies-Bouldin Index, and Silhouette Score. The testing results showed that the developed model achieved an accuracy of 85%, indicating good performance in clustering the test data. The cluster homogeneity value of 0.90 indicated that documents that should belong to the same cluster were grouped together effectively. The Davies-Bouldin Index value was 0.34, while the Silhouette Score was 0.76.

Downloads

Published

2025-11-11

Issue

Section

Articles
Abstract viewed = 0 times