Pemodelan Topik Berdasarkan Dokumen Penelitian Bidang Ilmu Komputer Menggunakan Text Mining

Bakhtiar Bakhtiar; Azhar Andika  Putra; Muhammad  Al Hapiz; Firga Abel  Astiawan

doi:10.36085/jsai.v8i3.9387

Authors

Bakhtiar Bakhtiar Universitas Sjakhyakirti, Palembang, Indonesia
Azhar Andika Putra Universitas Sjakhyakirti, Palembang, Indonesia
Muhammad Al Hapiz Universitas Sjakhyakirti, Palembang, Indonesia
Firga Abel Astiawan Universitas Sjakhyakirti, Palembang, Indonesia

DOI:

https://doi.org/10.36085/jsai.v8i3.9387

Abstract

This study aimed to develop a document clustering model using a combination of the IndoBERT model and the K-Means algorithm to group research abstracts in the field of computer science and technology. The data used consisted of 1000 research abstracts, divided into two parts: 80% for training data (800 abstracts) and 20% for testing data (200 abstracts). The IndoBERT model was used to represent the abstracts as embedding vectors, which were then processed with the K-Means algorithm to form 10 topic clusters, including artificial intelligence, computer systems and networks, programming, cybersecurity, and others. The training experiment used the training data to generate clusters and centroids for mapping new documents into the appropriate clusters. Evaluation was carried out using several metrics, including accuracy, cluster homogeneity, Davies-Bouldin Index, and Silhouette Score. The testing results showed that the developed model achieved an accuracy of 85%, indicating good performance in clustering the test data. The cluster homogeneity value of 0.90 indicated that documents that should belong to the same cluster were grouped together effectively. The Davies-Bouldin Index value was 0.34, while the Silhouette Score was 0.76.

Topic Modeling Based on Computer Science Research Documents Using Text Mining

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Menu