Project Details
Efficient Semantic Search on Big Data
Applicant
Professorin Dr. Hannah Bast
Subject Area
Theoretical Computer Science
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Security and Dependability, Operating-, Communication- and Distributed Systems
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Security and Dependability, Operating-, Communication- and Distributed Systems
Term
from 2014 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 254890286
This project is about efficient semantic search on big data, notably very large text collections and very large knowledge bases. In the first round of this SPP we have made the following contributions: a new search engine for interactive combined search on text and knowledge bases; a new scalable algorithm for decomposing text into its semantically coherent units; a new framework and algorithm for computing relevance scores for knowledge base triples; a self-learning question answering system for automatically translating natural-language questions into knowledge base queries;a comprehensive survey on the vast field of semantic search on text and knowledge bases.In the next round of this SPP, we plan to develop improved solutions for some of these problems, as well as solutions for new problems that cropped up during our work in the first round: a full-featured SPARQL+Text engine (existing SPARQL engines have only moderately powerful text-search extensions, our engine from the first round supports only tree-shaped queries and relies on their incremental construction); an extension of our question answering system to query patterns that are more complex and may involve a text-search component; a system for automatic completion of natural-language questions; improved large-scale named entity recognition and disambiguation for semantic search.For all the named problems, our goals are (like in the first round): provable efficient algorithms and data structures; an extensive experimental evaluation of their efficiency and quality; open-sourced software and a publicly accessible demonstrator or prototype;full reproducibility of our results by providing all relevant materials (if possible) or a dedicated web application.
DFG Programme
Priority Programmes
Subproject of
SPP 1736:
Algorithms for Big Data