Adaptive and Scalable Event Detection Techniques for Twitter Data Streams
Software Engineering and Programming Languages
Final Report Abstract
This project addressed the development of adaptive event detection techniques for Twitter. In particular, we focussed on the task of first story detection, i.e., the detection of general unknown events. Even though Twitter with its 330 million monthly active users who produce over 500 million tweets per day is an influential source of information, topic detection and tracking involves several new challenges. In comparison to traditional news media articles, Twitter “documents” are much shorter and contain a substantial amount of spam, advertising, typos, slang, etc. Our project planned to follow an empirical approach to study several existing event detection techniques in terms of how different configuration settings impact the produced results. Based on this understanding of the interplay between parameter settings and result quality, we proposed to design methods for automatic parameter adjustments, enabling a technique to adapt to quantitative and qualitative changes in the input Twitter data stream. At this point, we encountered unforeseen challenges that prevented us from conducting the work program as planned. Our experiments showed that existing event detection techniques are highly unstable w.r.t. small variations in parameter values, preventing us from understanding the aforementioned interplay. As a consequence, we deviated from the original work program and began to address these challenges in two ways. First, we started to investigate how event detection techniques could be made more stable. Our idea was to process the same Twitter data stream with different parameter settings in parallel and then only report events that had been detected by multiple parallel workers. This approach ultimately proved to be unsuccessful as the sets of events reported by different detectors were often completely disjoint. Nevertheless, we were able to abstract this idea of using parallelization to improve result quality into a general building block for data stream processing and demonstrate its benefits in other applications domains. Second, we began an effort to render research on event detection techniques for Twitter more reproducible. Apart from defining general measures that can be used to qualitatively and quantitatively compare different techniques, we proposed a benchmark to evaluate the performance and stability of techniques. This benchmark consists of a data generator that produces a data stream with the same statistical properties as the Twitter data stream and a ground truth of events that should be reported by any event detection technique applied to it. We hope that these proposals will be adopted by the research community working on event detection techniques for Twitter and help to evaluate the contribution made by future approaches more systematically.
Publications
- An Evaluation of the Run-time and Task-based Performance of Event Detection Techniques for Twitter. Information Systems, 62:207–219, 2016
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1016/j.is.2016.01.003) - Situation Monitoring of Urban Areas Using Social Media Data Streams. Information Systems, 57:129–141, 2016
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1016/j.is.2015.09.004) - Stability Evaluation of Event Detection Techniques for Twitter. In Proc. Intl. Symp. on Intelligent Data Analysis (IDA), pages 368–380, 2016
Andreas Weiler, Joran Beel, Bela Gipp, and Michael Grossniklaus
(See online at https://doi.org/10.1007/978-3-319-46349-0_32) - Survey and Experimental Analysis of Event Detection Techniques for Twitter. Oxford Computer Journal, 60(3):329–346, 2017
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1093/comjnl/bxw056) - Towards Reproducible Research of Event Detection Techniques for Twitter. In Proc. Swiss Conf. on Data Science (SDS), pages 69–74, 2019
Andreas Weiler, Harry Schilling, Lukas Kircher, and Michael Grossniklaus
(See online at https://doi.org/10.1109/SDS.2019.000-5)