Project Details
Projekt Print View

Development of a semi-automatic open source tool for layout analysis and region extraction and region classificiation (LAREX) of early prints.

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2018 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 394329162
 
The goal of the proposal is the further development of our efficient, semi-automatic and easy-to-use open-source segmentation tool LAREX und its integration in the open source workflow of the OCR-D functional model. The preliminary work LAREX (Layout Analysis and Region EXtraction) allows both a coarse segmentation by separation of text and non-text and a fine segmentation by detection and classification of different textual entites. LAREX utilizes an efficient implementation of the connected component approach. It has been used in the digitalization of different early prints und enables a qualitative good page segmentation with significantly less time than conventional alternatives. The main goal of the further development of LAREX is to reduce the degree of manual work. Therefore, a more robust segmentation und a further development of the rule and constraint language are necessary. The basic configurations should be easily adaptable to the peculiarities of a particular early print by both the users and learning algorithms. Furthermore, the comfortable GUI of LAREX for correction of single segmentation errors should be improved. This component is also necessary for defining a ground truth for learning algorithms and for evaluation. The overall goal is to find an optimal combination between manual and automatic methods. The tool and the process model will be substantially evaluated with various cooperation partners, in particular in the context of the digitalization of early prints within the OCR-D function model including the subsequent OCR by the linkage of external tools.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung