Project Details
Projekt Print View

Digitisation / Cataloguing of non-textual objects: A standardised and optimised process for data acquisition from digital images of herbarium specimens

Subject Area Evolution and Systematics of Plants and Fungi
Software Engineering and Programming Languages
Term from 2014 to 2017
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 248339659
 
The project will develop and document a software-driven standard process for extracting metadata from images of herbarium specimens (i.e. dried pressed plants or plant parts mounted on cardboard and stored in natural history collections). We will address a large proportion of science collections: approximately 22 million herbarium specimens exist as botanical reference objects in Germany, about 500 million worldwide. Metadata like plant name, collection site and date, collector, accession numbers, etc. are also glued flat on the sheet and thus visible on the specimen image. Up to now most of the data capture is manually fed into collection databases, but increasingly, imaging techniques are employed (also to ensure that the on-line metadata can be verified). The standard process shall replace or add to the manual data input as much as possible. Image processing software detects objects on the digitized record and classifies them. Text objects will be transformed into structured information using text mining algorithms. For handwriting, author identification is attempted. The project will evaluate and enhance existing software to conform to standard interfaces and integrate it into an overall open software architecture on the basis of established IT standards. Finally, the requirements for the process will be formulated as a standard and the actual application will be documented.
DFG Programme Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Participating Person Anton Güntsch
 
 

Additional Information

Textvergrößerung und Kontrastanpassung