Optimized use of OCR methods – Tesseract as a component of the OCR-D workflow

Applicant Dr. Sabine Gehrlein

Term from 2018 to 2020

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 394264782

Project Description

Tesseract is a free software for text recognition (optical character recognition, OCR). This software has a history of more than 30 years of continuous development and improvements. In the small group of open source products for OCR Tesseract belongs to the programs with the best recognition rates.Since end of 2016 Tesseract supports state-of-the-art text recognition by neural networks (LSTM).The context of OCR-D requires well defined interfaces for OCR software. The project will actively contribute to the definition of such interfaces. It will implement them for Tesseract to allow inclusion of Tesseract in an OCR workflow. We also strives to improve the stability, performance and practical usability of Tesseract.

DFG Programme Research data and software (Scientific Library Services and Information Systems)

Co-Investigator Stefan Weil

Servicenavigation

Hauptnavigation

Optimized use of OCR methods – Tesseract as a component of the OCR-D workflow

Additional Information

Servicenavigation

Hauptnavigation

Optimized use of OCR methods – Tesseract as a component of the OCR-D workflow

Additional Information

Textvergrößerung und Kontrastanpassung