Project Details
Ground truth inference and quality control of geospatial data collection by paid crowdworkers for the efficient acquisition of training data for deep learning systems
Applicants
Professor Dr. Uwe Sörgel; Dr.-Ing. Volker Walter
Subject Area
Geodesy, Photogrammetry, Remote Sensing, Geoinformatics, Cartography
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 501973633
Currently, great efforts are being made to apply Deep Learning systems like Convolutional Neural Networks (CNN) also to remote sensing images. However, due to the peculiarities of remote sensing images, standard CNNs are of limited use for their analysis. It would be desirable to train specialized CNNs from scratch, but this is yet not possible due to the lack of the required large amount of annotated training data.Crowdsourcing offers an effective method for providing such data, which has led to increasing interest in using this method to collect geospatial data from remote sensing images. However, the crowd is composed of people with very different backgrounds, most of whom are not familiar with geospatial data collection standards. Therefore, we must expect results of very heterogeneous quality. The objective of this project is to enable the collection of high-quality data from remote sensing images by paid crowdworkers. The process is designed such that no time-consuming manual inspection of the results is necessary even if no reference data are available. We suggest a data-driven approach based on multiple data collection to describe and improve the geometric quality of the collected data. First, we define an integrated quality measure that quantifies the similarity of two geometric representations (one collected by a crowdworker, one the corresponding ground truth) of a geographic object with one numerical value. We will derive this measure based on statistical evaluations by using an approach from the information theory. As next step, we will integrate multiple representations into one common geometry. We will use the quality measure on the one hand to evaluate the quality of the integrated geometries and on the other hand to optimize the integration process. This can be realized even intrinsically without comparison to given ground truth.Then, we want to investigate if by using a CNN an automated quality evaluation can be realized also without multiple data collection. The input of this CNN will be a remote sensing image and one individual geometry collected by a crowdworker. As output, the CNN shall predict a quality measure that describes how good the object was collected. Using such a CNN, we are able to avoid the necessity of multiple acquisitions of the same object. Consequently, collecting data will be much cheaper.In order to validate the generalizability of our approach, we apply it to scenes of quite different characteristics. This requires an adaptation of our model to different domains. For this purpose, we use an Active Learning approach, which iteratively performs domain adaptation in the interplay of crowdworkers and a CNN. Finally, we address possible follow-up research.
DFG Programme
Research Grants