Predicting Flood Potential Using Machine Learning with the XGBoost and Logistic Regression Approaches

Authors

  • Nurita Evitarina Universitas Pertiba
  • Fitriyanti Fitriyanti Universitas Pertiba
  • Tri Dewi Yuni Utami Universitas Pertiba

DOI:

https://doi.org/10.36085/jsai.v9i1.9867

Abstract

Flooding is one of the most frequent natural disasters in Indonesia, causing significant material losses and casualties. This study aims to develop a flood potential prediction model based on weather data using machine learning approaches, namely XGBoost and Logistic Regression. The dataset consists of 1,513,505 weather records with 1,165 flood events (0.077%). The features include temperature, humidity, wind speed and direction, weather codes, and temporal features generated using a sliding window approach for H-1, H-2, and H-3. Data imbalance was addressed using a combination of stratified undersampling and SMOTE, changing the class ratio from 1:1,298 to 1:3.3. Experimental results show that XGBoost outperforms Logistic Regression, achieving an accuracy of 98.40%, precision of 97.93%, recall of 95.07%, and an ROC-AUC of 99.38%, while Logistic Regression achieved an accuracy of 62.77%. Feature importance analysis indicates that weather codes at H-3 and H-1 are the most influential predictors. With a low false negative rate of 4.9%, the proposed XGBoost model is considered reliable for implementation as a flood early warning system.

Downloads

Published

2026-01-20

Issue

Section

Articles