Development of an Indonesian NLP-Based ESG Media Intelligence System Using TF-IDF and IndoBERT

Authors

  • Lukman Hakim Moeslich Universitas Pembangunan Jaya
  • Cahyono Budy Santoso Universitas Pembangunan Jaya

DOI:

https://doi.org/10.36085/jsai.v9i2.10586

Abstract

Monitoring Environmental, Social, and Governance (ESG) issues in Indonesia’s nickel mining industry has become increasingly important due to growing demands for transparency and sustainability. However, automated ESG media analysis for Indonesian-language news remains limited. This study aims to develop an ESG media intelligence system based on Natural Language Processing (NLP) to analyze media perception toward PT Indonesia Weda Bay Industrial Park (IWIP) and PT Weda Bay Nickel (WBN). The proposed system employs an eight-stage pipeline consisting of automated news collection, Indonesian text preprocessing, ontology-based ESG labeling, text classification using TF-IDF + LinearSVC and IndoBERT, as well as sentiment and ESG risk trend analysis. A total of 1,693 news articles published between January 2020 and May 2026 were collected, with 1,320 articles successfully labeled using an ontology-based weak supervision approach. Experimental results show that the best TF-IDF configuration achieved a Macro-F1 score of 0.7693, while IndoBERT achieved 0.7698. The findings indicate that TF-IDF remains competitive with transformer-based models on limited Indonesian ESG datasets. Media analysis revealed that IWIP received predominantly negative media perception on environmental and social issues, while WBN showed relatively more positive governance-related coverage. This research contributes to the development of Indonesian-language ESG media intelligence for the mining industry.

Downloads

Published

2026-06-03

Issue

Section

Articles