Classification and Intelligent Search on Information in XML

Antragsteller Professor Dr. Norbert Fuhr
Fachliche Zuordnung Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Förderung Förderung von 2001 bis 2008
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 5337513


XML will be the method of choice for representing all kinds of documents in product catalogs, digital libraries, scientific data repositories, and across the Web. However, merely casting all documents into XML format does not necessarily make a document's semantics explicit and more amenable for effective information searching. Rather, to fully leverage XML on a global scale, significant progress is needed on the following issues: - providing an easy-to-use yet powerful and efficient search language that combines concepts from current XML pattern-matching languages (e.g., XPATH, XML-QL, etc.) with ontology-backed information-retrieval-style search result ranking, - extracting more semantics from existing document collections by constructing structural and ontological skeletons (e.g., in the form of DTDs) that describe the data at a higher semantic level and can also facilitate new forms of indexing for efficiency, and - classifying existing documents according to a given thematic or personalized, hierarchical ontology to make searching more effective (e.g., exploit relevance feedback) and efficient (e.g., limit the search focus). The proposed project, coined CLASSIX (Classification and Intelligent Search on Information in XML), will address all three, mutually interrelated issues, leveraging existing techniques for pattern extraction, classification, pattern matching as well as text search and ranking, and aims to integrate them for a significant step towards intelligent handling of XML documents. Collections of XML data from different areas will serve as an experimental testbed to evaluate and demonstrate the project's contributions.
DFG-Verfahren Sachbeihilfen