Application of Optical Character Recognition (OCR) in Retrieving Book Title Text for Library Archives Digitization
DOI:
https://doi.org/10.36085/jsai.v8i2.8457Abstract
The advancement of information technology requires libraries to transform into digital services. One important step in this process is the digitization of book titles, which has previously been carried out manually and is prone to errors. This study aims to develop a system for extracting book title text from cover images using the Optical Character Recognition (OCR) method based on MATLAB 2017b. The OCR method used in this research applies a template matching and feature extraction approach, where characters are recognized by matching them against built-in character templates in the system after going through an image preprocessing phase. The preprocessing stages include Region of Interest (ROI) selection, grayscale conversion, contrast enhancement, noise removal, and image resizing. After preprocessing, the text is extracted using OCR and stored in digital format. The system was tested using 60 book cover images from the Regional Library of Bengkulu Province, featuring various font types, colors, and lighting conditions. The test results, evaluated using a confusion matrix, show good system performance with an accuracy of 81.67%, precision of 83.05%, and recall of 98.00%. The high recall value indicates that the system is capable of accurately recognizing most of the book title text. Therefore, this system can serve as an initial solution to support the automatic, fast, and efficient digitization of library archives.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Julian Mulyadi, Nuri David Maria Veronika

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.