Project Details
Projekt Print View

Closing the Gap Between High- and Low-Dimensional Models of High-Level Vision

Applicant Dr. Heiko Schütt
Subject Area General, Cognitive and Mathematical Psychology
Cognitive, Systems and Behavioural Neurobiology
Term from 2018 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 418432665
 
Final Report Year 2021

Final Report Abstract

The original aim of this project was to close the gap between small scale hand-crafted models of the visual system and the modern, large scale machine learning based computer vision models. To do so, I proposed three subgoals: (1) Improving methods for analyzing and testing large scale models; (2) understanding perceptual organization; and (3) Using the gained insights to design better models of human high-level vision that combine classical cognitive modeling ideas with modern deep learning techniques. Towards the first aim, substantial progress was made primarily focussed on representational similarity analysis. For this method, I contributed to improvements to better metrics for comparing representational geometries and—going beyond my original proposal —took a leading role in building a python toolbox for representational similarity analysis, which provides better inference methods for model performance. Especially, it improves on the employed bootstrap methods to allow correct generalization to the population of subjects and stimuli simultaneously while allowing flexible models, that need to be fit to data. We test these new statistical methods through extensive simulations to confirm their correctness, which had not been done for RSA before and highlighted the shortcomings of previous bootstrapping methods for RSA. Towards the second aim, I first generated a testbed of dead rectangle stimuli, closely following the plans I presented in my proposal, collected some data from human observers and made some interesting observations. Unfortunately, It became clear that even these simplified stimuli were too complex for finding optimal solutions. This problem lead to some delay in the project until I found a formulation for perceptual organization, which can avoid the computationally hard assignment of object labels. The formulation I found instead proposes that the visual system encodes for nearby locations whether they correspond to the same object or not based on how similar or predictable the representations at the two locations are. Using modern contrastive learning techniques from deep neural networks, this model can learn abstract features and how to perform the prediction from unlabeled images. As this model combines an idea from cognitive science with deep learning techniques it represents a contribution to aim 3. As a first test of the model I show that I can train this model successfully and that it produces sensible contour estimates on a popular segmentation benchmark. In the future it will be interesting to test whether the features learned by the model, the local interactions postulated and the inference dynamics implied match observations on the human visual system. Besides the aims proposed in the original proposal, my problems with computationally solving models that marginalize over all possible solutions as Bayesian optimal observers do, lead to a side project: Here I collaborated with two other lab members to test alternatives to full Bayesian decision making, which avoid the computationally hard marginalization. Particularly, we find that a point estimate observer model, which fully commits to the most likely world state can explain human perceptual decision making to a similar degree as full Bayesian inference can. Overall, this project largely reached its original aims, despite some delays due to the COVID pandemic and hard computational problems. We substantially improved the statistical inference methods for RSA and tested them thoroughly for the first time. We implemented the dead rectangle based task and measured some human behavior to compare to. And finally, we now propose a deep neural network model that includes a mechanism for perceptual organization, which applies a solution based on simple cognitive model considerations to an image-computable model at full complexity.

Publications

  • (2019). Dead Rectangles as a Stimulus for Perceptual Organisation Research. 2019 Conference on Cognitive Computational Neuroscience, Berlin, Germany
    Schütt, H., H. & Ma, W.
    (See online at https://doi.org/10.32470/CCN.2019.1100-0)
  • (2020). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data analysis, and Theory
    Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N.
    (See online at https://doi.org/10.51628/001c.27664)
  • (2021). Point estimate observers: A new class of models for perceptual decision making
    Schütt, H. H., Yoo, A., Calder-Travis, J. M., & Ma, W.
    (See online at https://doi.org/10.31234/osf.io/bqkf4)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung