Project Details
Optimized use of OCR methods – Tesseract as a component of the OCR-D workflow
Applicant
Dr. Sabine Gehrlein
Term
from 2018 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 394264782
Tesseract is a free software for text recognition (optical character recognition, OCR). This software has a history of more than 30 years of continuous development and improvements. In the small group of open source products for OCR Tesseract belongs to the programs with the best recognition rates.Since end of 2016 Tesseract supports state-of-the-art text recognition by neural networks (LSTM).The context of OCR-D requires well defined interfaces for OCR software. The project will actively contribute to the definition of such interfaces. It will implement them for Tesseract to allow inclusion of Tesseract in an OCR workflow. We also strives to improve the stability, performance and practical usability of Tesseract.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)
Co-Investigator
Stefan Weil