Content-based Image Retrieval using Tesseract OCR Engine and Levenshtein Algorithm

Adjetey, C.; Adu-Manu, K.S.

Content-based Image Retrieval using Tesseract OCR Engine and Levenshtein Algorithm

Files

Contentbased-Image-Retrieval-using-Tesseract-OCR-Engine-and-Levenshtein-AlgorithmInternational-Journal-of-Advanced-Computer-Science-and-Applications.pdf (3.25 MB)

Date

2021

Authors

Adjetey, C.

Adu-Manu, K.S.

Publisher

IJACSA

Abstract

—Image Retrieval Systems (IRSs) are applications that allow one to retrieve images saved at any location on a network. Most IRSs make use of reverse lookup to find images stored on the network based on image properties such as size, filename, title, color, texture, shape, and description. This paper provides a technique for obtaining full image document given that the user has some portions of the document under search. To demonstrate the reliability of the proposed technique, we designed a system to implement the algorithm. A combination of Optical Character Recognition (OCR) engine and an improved text matching algorithm was used in the system implementation. The Tesseract OCR engine and Levenshtein Algorithm was integrated to perform the image search. The extracted text is compared to the text stored in the database. For example, a query result is returned when a significant ratio of 0.15 and above is obtained. The results showed a 100% successful retrieval of the appropriate file base on the match even when partial query images were submitted.

Description

Research Article

Keywords

Image Retrieval Systems, image processing, Optical Character Recognition (OCR), text matching algorithm, Tesseract OCR engine, Levenshtein Algorithm

URI

http://ugspace.ug.edu.gh/handle/123456789/37516

Collections

Department of Computer Science

Full item page

Content-based Image Retrieval using Tesseract OCR Engine and Levenshtein Algorithm

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By