Adaptive and Scalable Event Detection Techniques for Twitter Data Streams

Applicant Professor Dr. Michael Grossniklaus

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Software Engineering and Programming Languages

Term from 2015 to 2018

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 275968728

Final Report Year 2019

Final Report Abstract

This project addressed the development of adaptive event detection techniques for Twitter. In particular, we focussed on the task of ﬁrst story detection, i.e., the detection of general unknown events. Even though Twitter with its 330 million monthly active users who produce over 500 million tweets per day is an inﬂuential source of information, topic detection and tracking involves several new challenges. In comparison to traditional news media articles, Twitter “documents” are much shorter and contain a substantial amount of spam, advertising, typos, slang, etc. Our project planned to follow an empirical approach to study several existing event detection techniques in terms of how different conﬁguration settings impact the produced results. Based on this understanding of the interplay between parameter settings and result quality, we proposed to design methods for automatic parameter adjustments, enabling a technique to adapt to quantitative and qualitative changes in the input Twitter data stream. At this point, we encountered unforeseen challenges that prevented us from conducting the work program as planned. Our experiments showed that existing event detection techniques are highly unstable w.r.t. small variations in parameter values, preventing us from understanding the aforementioned interplay. As a consequence, we deviated from the original work program and began to address these challenges in two ways. First, we started to investigate how event detection techniques could be made more stable. Our idea was to process the same Twitter data stream with different parameter settings in parallel and then only report events that had been detected by multiple parallel workers. This approach ultimately proved to be unsuccessful as the sets of events reported by different detectors were often completely disjoint. Nevertheless, we were able to abstract this idea of using parallelization to improve result quality into a general building block for data stream processing and demonstrate its beneﬁts in other applications domains. Second, we began an effort to render research on event detection techniques for Twitter more reproducible. Apart from deﬁning general measures that can be used to qualitatively and quantitatively compare different techniques, we proposed a benchmark to evaluate the performance and stability of techniques. This benchmark consists of a data generator that produces a data stream with the same statistical properties as the Twitter data stream and a ground truth of events that should be reported by any event detection technique applied to it. We hope that these proposals will be adopted by the research community working on event detection techniques for Twitter and help to evaluate the contribution made by future approaches more systematically.

Publications

An Evaluation of the Run-time and Task-based Performance of Event Detection Techniques for Twitter. Information Systems, 62:207–219, 2016
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1016/j.is.2016.01.003)
Situation Monitoring of Urban Areas Using Social Media Data Streams. Information Systems, 57:129–141, 2016
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1016/j.is.2015.09.004)
Stability Evaluation of Event Detection Techniques for Twitter. In Proc. Intl. Symp. on Intelligent Data Analysis (IDA), pages 368–380, 2016
Andreas Weiler, Joran Beel, Bela Gipp, and Michael Grossniklaus
(See online at https://doi.org/10.1007/978-3-319-46349-0_32)
Survey and Experimental Analysis of Event Detection Techniques for Twitter. Oxford Computer Journal, 60(3):329–346, 2017
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl
(See online at https://doi.org/10.1093/comjnl/bxw056)
Towards Reproducible Research of Event Detection Techniques for Twitter. In Proc. Swiss Conf. on Data Science (SDS), pages 69–74, 2019
Andreas Weiler, Harry Schilling, Lukas Kircher, and Michael Grossniklaus
(See online at https://doi.org/10.1109/SDS.2019.000-5)

Servicenavigation

Hauptnavigation

Adaptive and Scalable Event Detection Techniques for Twitter Data Streams

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Adaptive and Scalable Event Detection Techniques for Twitter Data Streams

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung