ProjectsProject Details

Optical Character Recognition (OCR) for Old Torah Manuscripts

Project ID: 5347-1-20
Year: 2020
Student/s: Tal Stolovich, Ohad Kimelfeld
Supervisor/s: Ori Bryt
Award: Wilk award

In this work, two architectures of Optical Character Recognition (OCR) systems were demonstrated for the Solitreo font of the Hebrew Language. The first architecture demonstrated was based upon text detection and classification. The second architecture demonstrated was based upon cropping a document into separate text rows, and whole row translation using an LSTM network. In addition, handwritten text document processing algorithms were also demonstrated, such as: Binarization, Connected Components Analysis, Text Row Detections, and more. Some of these algorithms aid in reducing input dimensionality and therefore assist in achieving improved results, some of these algorithms are crucial for intermediate handwritten text processing tasks. In the full report we also presents the reasons for these algorithms usage, pros and cons, performance upon the Solitreo font of the Hebrew language, and performance comparisons.

Poster for Optical Character Recognition (OCR) for Old Torah Manuscripts