Project Details
Adaptive and Scalable Event Detection Techniques for Twitter Data Streams
Applicant
Professor Dr. Michael Grossniklaus
Subject Area
Security and Dependability, Operating-, Communication- and Distributed Systems
Software Engineering and Programming Languages
Software Engineering and Programming Languages
Term
from 2015 to 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 275968728
With 271 million monthly active users that produce over 500 million tweets per day, Twitter is currently the most popular and fastest-growing microblogging service. Twitter is therefore increasingly used as a source of information on current events as they unfold.For traditional media such as newspaper archives and news website, the problem of event detection has been addressed by research from the area of Topic Detection and Tracking (TDT). However, topic detection in Twitter data streams raises a set of additional challenges. First, Twitter documents are much shorter than traditional news articles due to their length limitation and therefore harder to classify. Second, tweets are not edited and can therefore contain a substantial amount of spam, typos, slang, etc. Finally, the rate with which tweets are being produced is very bursty and will continue to increase as more users adopt Twitter in the future.Several approaches for event detection in social media and, in particular, for Twitter have been proposed. However, most of these proposals tend to focus exclusively on the information extraction aspect and often ignore the streaming nature of the input. For example, many techniques come with a complex but fixed set of parameters that control which events are detected. It is assumed that these parameters are empirically determined by running the algorithm on a sample data set until it produces the desired result. We argue that there are several reasons why this approach is neither realistic nor feasible. First, the data in the stream may undergo qualitative changes that may require parameters to adapt in order to continue to detect events accurately. Second, these parameter not only control the task-based performance of a technique but also the run-time performance. Working with fixed parameters therefore prevents these approaches to scale with quantitative changes in the stream.In this project, we propose to address the need for adaptive and scalable event detection in Twitter in the tradition of Data Stream Management Systems (DSMS) research. In order to focus the project, we will concentrate on the specific task of first story detection, i.e., the detection of general (unknown) events, which is defined as one of the subtasks of TDT. We plan to address these issues in three separate work packages. In the first work package, we will study how event detection methods can adapt to the content of the stream by exploring better ways to segment the stream before it is processed and by adjusting method parameters during processing. The second work package will address scalability requirements in terms of scaling up and down with the volume of one stream but also in terms of scaling up to several parallel streams. Finally, a third work package will be dedicated to the non-trivial task of evaluating event detection techniques.
DFG Programme
Research Grants
Co-Investigator
Professor Dr.-Ing. Marc H. Scholl