Project Details
Transferability of machine learning models for digital soil mapping
Applicant
Professor Dr. Thomas Scholten
Subject Area
Soil Sciences
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 448762063
Machine learning (ML) models have shown great success in learning complex spatial patterns of soil formation and soil properties and are enabled to make predictions about unobserved soil data. In contrast, the ability to transfer what has been learned to other areas is much less developed and so far, the models can only be transferred to areas outside the immediate learning environment to a very limited extent. Similar to empirical regressions, the sets of rules, for example, in decision tree procedures such as Random Forest, only apply to the value range covered by training data. Advances in the field of Deep Learning (DL), e.g. Convolutional Neural Networks, Transfer Learning and combined approaches in the field of Feature Selection (FS) offer extended possibilities here to limit dimensionality especially in smaller data sets, to minimize over-adaptation to training data and to improve transfer to adjacent areas. In the present proposal we address these developments and try to predict soil properties also for areas outside the learning environment. To this end, we use environmental factors to create an area-specific parameterization of machine learning models using geomorphometric, geological, landscape ecological and climate parameters. Which parameters these are in detail and how they relate to each other will be calculated exemplarily for different test data sets in Germany (humid climate) and in Iran (semi-arid to arid climate) by combining methods of DL and FS. In the following step, the models trained with the selected covariates of the environmental pattern analysis and the soil profile data are transferred to non-trained areas and validated on independent soil data. The untrained areas are characterized by distance and similarity metrics with regard to their comparability with the original training areas in order to assess the transfer performance of the machine learning models. Finally, it is planned to gradually add training data for the unknown areas in order to quantify the development of the prediction accuracy and to assess the transfer properties of different ML methods. Training data will be LUCAS data for Germany and soil profile data from the national SPDB database for Iran. The environmental parameters are derived from satellite data, digital elevation models, world climate data and geological and land use maps. Soil properties to be tested are soil carbon content, soil texture, carbonate content and cation exchange capacity. 12 ML methods are used for comparison.
DFG Programme
Research Grants