Project Details
Projekt Print View

Learning Concepts in Deep Networks

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Software Engineering and Programming Languages
Term from 2012 to 2017
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 227351812
 
Final Report Year 2019

Final Report Abstract

Our work has led to a number of insights into the representations of deep networks, such as the emergence of abstract representations in deep Boltzmann machines (DBMs), the specific role of convolution and pooling layers in neural networks, a consolidated understanding of layer-wise oscillations1 in stacked architectures, as well as the effect of training parameters and loss functions on the qualitative properties of learned representations. In addition to characterizing representations quantitatively, we also aimed at characterizing interaction between representations by identifying modes of interaction such as “preserve” or “complement”, which we further quantified in the context of Boltzmann machine architectures with a newly proposed “layer interaction number”. While our analysis was instrumental at characterizing the representation from small up to mid-scale machine learning models, we observed that due to the curse of dimensionality our kernel RDE analysis of deep networks may not fully identify the properties of a deep representation when the latter is high-dimensional. When analyzing high-dimensional representations, we observed that the aforementioned analysis should preferably be performed locally in the input domain. Our newly developed local RDE analysis (LRDE) achieves this localization. Furthermore, we used the LRDE technique to improve estimates of predictive uncertainty in kernel regression models. While at the beginning of the project, research on deep learning was mainly focused on learning unsupervised representations through techniques such as Boltzmann machines, or auto-encoders, there has been stunning progress on learning large-scale neural networks with GPU-based implementations, which have become state-of-the-art methodology in image recognition or natural language processing. Hence, in the second part of the project, we focused our research effort on understanding deep representations in these new state-of-the-art models. Specifically, we considered the extraction of human-interpretable insights from deep representations. To achieve this, we developed the layer-wise relevance propagation (LRP) method that can explain the predictions of complex state-of-the-art image and text classifiers in terms of input variables. The method was later given theoretical support by viewing the classifier as a composition of multiple functions (one per neuron), and performing a “deep Taylor decomposition” (DTD) of that composition of functions. Recent advances in the field of interpreting deep networks were summarized in a tutorial paper. Our work on interpreting deep representations was presented at numerous conferences and workshops in the form of research talks or tutorials. Furthermore, our work on analyzing representations has served as a source of inspiration, when applying machine learning to practical problems in chemistry and biology.

Publications

  • Deep Boltzmann Machines and the Centering Trick, in Neural Networks: Tricks of the Trade, 2nd Edn, Springer LNCS, vol. 7700, 2012
    G. Montavon, K.-R. Müller
    (See online at https://doi.org/10.1007/978-3-642-35289-8_33)
  • Neural Networks: Tricks of the Trade, 2nd Edn, Springer LNCS volume 7700, 2012
    G. Montavon, G. Orr, K.-R. Müller
    (See online at https://doi.org/10.1007/978-3-642-35289-8)
  • Analyzing Local Structure in Kernel-based Learning: Explanation, Complexity and Reliability Assessment, IEEE Signal Processing Magazine, 30(4):62-74, 2013
    G. Montavon, M. Braun, T. Krueger, K.-R. Müller
    (See online at https://doi.org/10.1109/MSP.2013.2249294)
  • Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies, Journal of Chemical Theory and Computation, 9(8):3404-3419, 2013
    K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, OA. v Lilienfeld, A. Tkatchenko, K.-R. Müller
    (See online at https://doi.org/10.1021/ct400195d)
  • On Layer-Wise Representations in Deep Neural Networks, PhD Thesis, Technische Universität Berlin, Germany, 2013
    G. Montavon
  • Wasserstein Training of Restricted Boltzmann Machines. Advances in Neural Information Processing Systems, 2016
    G. Montavon, K.-R. Müller, M. Cuturi
  • Explaining NonLinear Classification Decisions with Deep Taylor Decomposition. Pattern Recognition, 65:211–222, 2017
    G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller
    (See online at https://doi.org/10.1016/j.patcog.2016.11.008)
  • Methods for Interpreting and Understanding Deep Neural Networks. Digital Signal Processing, 73:1-15, 2018
    G. Montavon, W. Samek, K.-R. Müller
    (See online at https://doi.org/10.1016/j.dsp.2017.10.011)
  • Structuring Neural Networks for More Explainable Predictions in Explainable and Interpretable Models in Computer Vision and Machine Learning, pp 115-131, Springer SSCML, 2018
    L. Rieger, P. Chormai, G. Montavon, L.-K. Hansen, K.-R. Müller
    (See online at https://doi.org/10.1007/978-3-319-98131-4_5)
  • Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models. CoRR abs/1805.06230, 2018
    J. Kauffmann, K.-R. Müller, G. Montavon
 
 

Additional Information

Textvergrößerung und Kontrastanpassung