Topic Modeling Based on Computer Science Research Documents Using Text Mining
DOI:
https://doi.org/10.36085/jsai.v8i3.9387Abstract
This study aimed to develop a document clustering model using a combination of the IndoBERT model and the K-Means algorithm to group research abstracts in the field of computer science and technology. The data used consisted of 1000 research abstracts, divided into two parts: 80% for training data (800 abstracts) and 20% for testing data (200 abstracts). The IndoBERT model was used to represent the abstracts as embedding vectors, which were then processed with the K-Means algorithm to form 10 topic clusters, including artificial intelligence, computer systems and networks, programming, cybersecurity, and others. The training experiment used the training data to generate clusters and centroids for mapping new documents into the appropriate clusters. Evaluation was carried out using several metrics, including accuracy, cluster homogeneity, Davies-Bouldin Index, and Silhouette Score. The testing results showed that the developed model achieved an accuracy of 85%, indicating good performance in clustering the test data. The cluster homogeneity value of 0.90 indicated that documents that should belong to the same cluster were grouped together effectively. The Davies-Bouldin Index value was 0.34, while the Silhouette Score was 0.76.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Bakhtiar Bakhtiar, Azhar Andika Putra, Muhammad Al Hapiz, Firga Abel Astiawan

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.




