Project Details
Projekt Print View

Spatio-Temporal Hypercolumns for Instance-based Semantic Segmentation in Video

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2017 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 387723725
 
Video segmentation is one of the most challenging open problems in computer vision. Although multiple approaches have been proposed in the literature to address this task, state-of-the-art algorithms are still far from reaching human-level performance in realistic unconstrained videos. In this work, we propose a two year research program that focuses on studying the interaction between video segmentation and object recognition, introducing thus category-specific information in order to improve the video segmentation process. As starting point of our approach, we will generalize to the spatio-temporal domain state-of-the-art algorithms for static generic segmentation and semantic segmentation, by taking into account optical flow estimation. During the first year of our two year research project, we will seek for an effective combination of our recent works Convolutional Oriented Boundaries (COB) and FlowNet, in order to build a spatio-temporal video segmentation algorithm that involves local and spatial information as well as temporal consistency. Once we have extracted a consistent spatio-temporal video segmentation, we will propagate the pixel labels along frames through trajectory motion affinities and build a spatio-temporal representations for the objects and surfaces which we call Convolutional Temporal Tubes (CTT). During the second year, we will extend our previous work on Hypercolumns [3] by instantiating a spatiotemporalhypercolumn framework on the CTT, in order to refine the spatial support of objects and surfaces given their semantic characteristics while preserving temporal consistency. This representation of a video in terms of spatio-temporal regions that are stable over time while being aware of semantics and of individual instances of objects is the final objective for this two year research project. The realization of our research programme is expected to bridge the gap between human and computer performance in video segmentation for the current benchmarks. These results will enable further research in scene and object structure recovery, 3D reconstruction, video understanding, actionand object recognition, among many other applications. This project seeks to strengthen scientific exchanges between Germany and Colombia, and will be conducted in close collaboration by researchers in both countries.
DFG Programme Research Grants
International Connection Colombia
Partner Organisation Universidad de los Andes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung