Probabilistic Query Processing in Uncertain Spatio-temporal Data
Final Report Abstract
The primary goal of the project was to develop and evaluate solutions for efficient management, query processing and analysis of spatio-temporal data in consideration of the inherent uncertainty of such data. In this project, we mainly addressed the two key problems, modelling and representing of uncertain spatio-temporal data (Subproject 1) and efficient and effective approaches for querying and analyzing uncertain spatio-temporal data (Subproject 2). To address the first problem, we developed a framework that enables us to further study and to obtain a deeper intuition of the quality of spatio-temporal data models. Based on this framework, we could show the applicability and quality of discrete Markov chain models for the modelling of uncertain spatio-temporal data based on artificial and real world trajectory datasets. With the assumption of mutually independent objects (object trajectories), our experimental results show that the Markov chain model is a promising and sufficiently accurate approach providing efficient ways of inference required for expensive probabilistic query processes. In contrast to traditional models for uncertain spatiotemporal data using spatial extrapolation techniques based on spatio-temporal constraints, such as maximal speed, direction, etc., our Markov-chain model is able to preserve spatial and temporal dependencies and, thus, yield more accurate results without much loss of query performance. However, the drawback of our model is that it needs a set of historic data (observations) to train the model. Though, we conducted our study primarily on road-traffic trajectory data, our approach is also applicable to any spatio-temporal data providing observations of space (incl. any form of multidimensional feature space) and time and, thus, it provides a solution for a wide application field. In our studies on modeling uncertain spatio-temporal data, we additionally addressed the problem of discovering information about the uncertainty inherent in spatio-temporal data. We considered alternative routing approaches based on uncertain routing conditions as well as uncertainty in sets of trajectories that is induced by inconsistent states of trajectory databases. The later approach is based on mutual dependencies between objects, e.g. two trajectories can’t meet at the same place at the same time. Though both approaches yield sets of sets of alternative trajectory instances, we had problems in identifying an appropriate probabilistic model that assigns likelihoods for each of the trajectory instance. Though, due to lack of further knowledge, given a set of alternative trajectories, the assumption of an equal distribution is still best practice to overcome this issue. In the third part of Subproject 1, we focused on methods for representing and computing objects with uncertain spatial extension. In this part of our study, we first focused on the problem of computing Voronoi cells for uncertain data which are useful for a range of spatial and spatio-temporal query processing tasks, including nearest neighbor search and continuous nearest neighbor search. The main challenges we addressed here were 1) the identification of models for the representation of uncertain Voronoi cells that are applicable for major application fields and 2) methods for efficient computation of these representations. For the first problem, we introduced the concept of Possible Voronoi cells and Guaranteed Voronoi Cells, both represented by a quadtree-based hierarchical spatial partitioning. With novel spatial pruning techniques we were able to overcome the high computational cost for the computation of these Voronoi cells. Furthermore, we also studied approximate approaches for similarity queries for uncertain spatially extended objects (fuzzy objects) based on shape descriptors. Our results showed that multi-step query processing approaches achieve higher accuracy but less query performance than techniques that approximate distance-confidence potentials by means of Sigmoid functions. In the second part of this project, Subproject 2, we first concentrated on efficient and effective approaches for querying and analyzing uncertain spatio-temporal data. In addition to approaches for efficient probabilistic query processing based on the models and data representations studied in the first part (Subproject 1), including a novel approach for probabilistic reverse nearest neighbor query processing, we also investigated methods for probabilistic continuous monitoring queries on sensor networks with uncertain sensor readings. Specifically, we elaborated solutions for continuous aggregation and fusion of continuously changing values. The key achievement of this study is a new strategy for updating SUM query results efficiently in an incremental way. In addition, toward our investigation on indexing uncertain spatio-temporal data, we studied generalized index methods for multi-represented objects, where we showed how similarity search over multiple metrics can be efficiently supported by one single index structure. One of our main contributions to this project are our studies on pattern mining in uncertain data. Here, we concentrated on clustering problems that are of particular interest for spatio-temporal data in many applications. Due to the requirement of the consideration of inter-object dependencies, analytical approaches become hard problems. To overcome this problem, we investigated approximate solutions based on sampling (Mote-Carlo Sampling). The main idea was a strategy to identify an appropriate small set of significant representations of possible clustering results accompanied with confidence information, respectively. Our approach has shown to be general applicable to several pattern mining and spatial query processing problems and has been integrated in an open-source framework ELKI, that has been well received by many research groups. In the context of sampling-based query processing on uncertain data, we finally studied the problem of maximizing the information flow in uncertain sensor networks. The result of our study is an hybrid of analytic and sampling-based approach that lead to a patent that has been registered with the support of Siemens AG. To conclude, we have contributed many novel algorithms to the rapidly growing market for the data quality and uncertainty aware information retrieval and knowledge discovery in the context of spatial and spatio-temporal data. An important open question emerging from our project is the relevance of the accuracy of probabilistic models for the quality of the results of diverse query processing and pattern mining results.
Publications
- ”An extendable framework for managing uncertain spatio-temporal data.” In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1087-1090. ACM, 2014
Emrich, Tobias, Maximilian Franzke, Hans-Peter Kriegel, Johannes Niedermayer, Matthias Renz, and Andreas Züfle
(See online at https://doi.org/10.1145/2588555.2594523) - ”A framework for clustering uncertain data.” Proceedings of the VLDB Endowment 8, no. 12 (2015): 1976-1979
Schubert, Erich, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, and Arthur Zimek
(See online at https://doi.org/10.14778/2824032.2824115) - ”Knowledge-enriched route computation.” In International Symposium on Spatial and Temporal Databases, pp. 157-176. Springer, Cham, 2015
Skoumas, Georgios, Klaus Arthur Schmid, Gregor Jossé, Matthias Schubert, Mario A. Nascimento, Andreas Züfle, Matthias Renz, and Dieter Pfoser
(See online at https://doi.org/10.1007/978-3-319-22363-6_9) - ”Indexing multi-metric data.” In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on, pp. 1122-1133. IEEE, 2016
Franzke, Maximilian, Tobias Emrich, Andreas Züfle, and Matthias Renz
(See online at https://doi.org/10.1109/ICDE.2016.7498318) - ”Efficient Information Flow Maximization in Probabilistic Graphs.” IEEE Transactions on Knowledge and Data Engineering (2017)
Frey, Christian, Andreas Züfle, Tobias Emrich, and Matthias Renz
(See online at https://doi.org/10.1109/TKDE.2017.2780123) - ”Handling uncertainty in geo-spatial data.” In Data Engineering (ICDE), 2017 IEEE 33rd International Conference on, pp. 1467-1470. IEEE, 2017
Züfle, Andreas, Goce Trajcevski, Dieter Pfoser, Matthias Renz, Matthew T. Rice, Timothy Leslie, Paul Delamater, and Tobias Emrich
(See online at https://doi.org/10.1109/ICDE.2017.212) - ”Uncertain voronoi cell computation based on space decomposition.” In International Symposium on Spatial and Temporal Databases, pp. 98-116. Springer, Cham, 2015. (Later extended to journal version: Geoinformatica 21, no. 4 (2017): 797-827
Emrich, Tobias, Klaus Arthur Schmid, Andreas Züfle, Matthias Renz, and Reynold Cheng
(See online at https://doi.org/10.1007/978-3-319-22363-6_6 https://doi.org/10.1007/s10707-017-0293-2)