Lernkonzepte in tiefen neuronalen Netzen
Softwaretechnik und Programmiersprachen
Zusammenfassung der Projektergebnisse
Our work has led to a number of insights into the representations of deep networks, such as the emergence of abstract representations in deep Boltzmann machines (DBMs), the specific role of convolution and pooling layers in neural networks, a consolidated understanding of layer-wise oscillations1 in stacked architectures, as well as the effect of training parameters and loss functions on the qualitative properties of learned representations. In addition to characterizing representations quantitatively, we also aimed at characterizing interaction between representations by identifying modes of interaction such as “preserve” or “complement”, which we further quantified in the context of Boltzmann machine architectures with a newly proposed “layer interaction number”. While our analysis was instrumental at characterizing the representation from small up to mid-scale machine learning models, we observed that due to the curse of dimensionality our kernel RDE analysis of deep networks may not fully identify the properties of a deep representation when the latter is high-dimensional. When analyzing high-dimensional representations, we observed that the aforementioned analysis should preferably be performed locally in the input domain. Our newly developed local RDE analysis (LRDE) achieves this localization. Furthermore, we used the LRDE technique to improve estimates of predictive uncertainty in kernel regression models. While at the beginning of the project, research on deep learning was mainly focused on learning unsupervised representations through techniques such as Boltzmann machines, or auto-encoders, there has been stunning progress on learning large-scale neural networks with GPU-based implementations, which have become state-of-the-art methodology in image recognition or natural language processing. Hence, in the second part of the project, we focused our research effort on understanding deep representations in these new state-of-the-art models. Specifically, we considered the extraction of human-interpretable insights from deep representations. To achieve this, we developed the layer-wise relevance propagation (LRP) method that can explain the predictions of complex state-of-the-art image and text classifiers in terms of input variables. The method was later given theoretical support by viewing the classifier as a composition of multiple functions (one per neuron), and performing a “deep Taylor decomposition” (DTD) of that composition of functions. Recent advances in the field of interpreting deep networks were summarized in a tutorial paper. Our work on interpreting deep representations was presented at numerous conferences and workshops in the form of research talks or tutorials. Furthermore, our work on analyzing representations has served as a source of inspiration, when applying machine learning to practical problems in chemistry and biology.
Projektbezogene Publikationen (Auswahl)
- Deep Boltzmann Machines and the Centering Trick, in Neural Networks: Tricks of the Trade, 2nd Edn, Springer LNCS, vol. 7700, 2012
G. Montavon, K.-R. Müller
(Siehe online unter https://doi.org/10.1007/978-3-642-35289-8_33) - Neural Networks: Tricks of the Trade, 2nd Edn, Springer LNCS volume 7700, 2012
G. Montavon, G. Orr, K.-R. Müller
(Siehe online unter https://doi.org/10.1007/978-3-642-35289-8) - Analyzing Local Structure in Kernel-based Learning: Explanation, Complexity and Reliability Assessment, IEEE Signal Processing Magazine, 30(4):62-74, 2013
G. Montavon, M. Braun, T. Krueger, K.-R. Müller
(Siehe online unter https://doi.org/10.1109/MSP.2013.2249294) - Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies, Journal of Chemical Theory and Computation, 9(8):3404-3419, 2013
K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, OA. v Lilienfeld, A. Tkatchenko, K.-R. Müller
(Siehe online unter https://doi.org/10.1021/ct400195d) - On Layer-Wise Representations in Deep Neural Networks, PhD Thesis, Technische Universität Berlin, Germany, 2013
G. Montavon
- Wasserstein Training of Restricted Boltzmann Machines. Advances in Neural Information Processing Systems, 2016
G. Montavon, K.-R. Müller, M. Cuturi
- Explaining NonLinear Classification Decisions with Deep Taylor Decomposition. Pattern Recognition, 65:211–222, 2017
G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller
(Siehe online unter https://doi.org/10.1016/j.patcog.2016.11.008) - Methods for Interpreting and Understanding Deep Neural Networks. Digital Signal Processing, 73:1-15, 2018
G. Montavon, W. Samek, K.-R. Müller
(Siehe online unter https://doi.org/10.1016/j.dsp.2017.10.011) - Structuring Neural Networks for More Explainable Predictions in Explainable and Interpretable Models in Computer Vision and Machine Learning, pp 115-131, Springer SSCML, 2018
L. Rieger, P. Chormai, G. Montavon, L.-K. Hansen, K.-R. Müller
(Siehe online unter https://doi.org/10.1007/978-3-319-98131-4_5) - Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models. CoRR abs/1805.06230, 2018
J. Kauffmann, K.-R. Müller, G. Montavon