Title: Historical Ethiopic handwritten document recognition using deep learning

Authors: Fitehalew Ashagrie Demilew; Yaregal Tadesse Tessema; Gezahegn Mulusew Delele; Habtamu Asmare Sendeku

Addresses: Debre Tabor University, Debre Tabor, Debre Tabor, South Gondar, Ethiopia ' Addis Ababa Science and Technology University, Addis Ababa, Ethiopia ' Debre Tabor University, Debre Tabor, Debre Tabor, South Gondar, Ethiopia ' Debre Tabor University, Debre Tabor, Debre Tabor, South Gondar, Ethiopia

Abstract: Document analysis involves different step by step processes which are image acquisition, preprocessing, segmentation feature extraction, and classification. The process of historical document recognition is much harder than the default handwritten document recognition systems. In historical document recognition, the documents are highly degraded. In Ethiopia, a large number of historical documents can be found in monasteries, libraries, and museums which are written in Amharic languages. Documents that is as old as 1,000 years can be found in Ethiopia written in Amharic and Ge'ez languages. This paper intends on developing a document recognition system for the historical handwritten Amharic documents by mainly focusing on the preprocessing phases. A dataset is prepared which compromises the 230 Amharic alphabets and the dataset's frequency varies from 190-320 images per class. A total of 44,000 isolated characters is collected and split into training, testing, and validation set with the ratio of 7:2:1 respectively.

Keywords: historical document recognition; Ethiopic document recognition; degraded document recognition; segmentation; pre-processing; deep learning.

DOI: 10.1504/IJITCC.2023.132844

International Journal of Information Technology, Communications and Convergence, 2023 Vol.4 No.2, pp.124 - 140

Accepted: 06 Feb 2023
Published online: 11 Aug 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article