Project Details
TFB 32: Automatische Exzerption: Corpusbasierte Materialbeschaffung für die Lexikographie
Subject Area
Humanities
Term
from 2001 to 2003
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 5485575
Dictionaries are regularly updated, in relatively short intervals, tokeep them up to date. In the updating process, dictionary authors readnewspapers, books, etc. and they note all linguistic phenomenarelevant for their dictionary. This task will be supported, throughthe project, by means of NLP software (Natural Language Processing):-- The tools will analyse large amounts of text (usually over 100million words of news texts) to find new words and word combinations;-- linguistic phenomena found in the text corpora will be classifiedaccording to linguistic and lexicographic criteria; -- material found in the texts will be compared with the contents of the dictionary under revision, and the result will be displayed in an interactive tool. The project is based on lexical acquisition technology from thecollaborative research centre 340 ("Theoretische Grundlagen für dieComputerlinguistik"). In 2002, in particular new procedures for theidentification and classification of multiword expressions have beendeveloped and evaluated by the publishing partners, Duden BIFAB AG(Mannheim) and Langenscheidt KG (München). In addition, the system architecture and the Graphical User Interface (GUI) for lexicography have been specified> A first GUI prototype for part of the functions needed is available.
DFG Programme
CRC/Transfer Units
Completed projects
Applicant Institution
Universität Stuttgart
Spokesperson
Professor Dr. Christian Rohrer (†)