Date of Award
2024
Document Type
Thesis
Degree Name
Bachelors
Department
Natural Sciences
First Advisor
Gillman, David
Area of Concentration
Computer Science
Abstract
The goal of this thesis is to create a program which is capable of transliterating Greek manuscripts. An existing Transformer-based Optical Character Recognition (TrOCR) software was identified which is able to be trained on provided datasets. The model requires images of individual lines for its training data. This thesis deals with the effort to convert images which contain one to two pages of a manuscript into a series of images each containing a single line of text. The images are first preprocessed, being binarized and having OpenCV’s morphological filters applied. Using linear regression, it is then determined how many columns of text are within the image which is then cropped into a series of smaller images, each containing one of the columns of text. Using autoregression and mean pixel values, the location of each line is found within these column images. OpenCV’s contour methods are then used to assign text to each line. This information is used to create a series of images each containing one line of text.
Recommended Citation
Jones, Chloe, "OCR Of Greek" (2024). Theses & ETDs. 6556.
https://digitalcommons.ncf.edu/theses_etds/6556