Project Details
Projekt Print View

Semantic Video Prediction (P6)

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2017
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 313421352
 
Prediction of future measurements is a key capability of intelligent systems. It can be learned in a self-supervised way but needs to discover suitable scene representations in order to be successful. Effective human-robot collaboration requires a system to observe human actions and to make predictions on the future state of the collaborative work space. The objective of this project is to learn a sequence of representations of the shared human-robot workspace which are increasingly abstract and which allow for predictions for increasing time horizons. As motion segmentation helps prediction, the framework for unsupervised learning of hierarchical representations that has been developed in the first-phase project "Learning Hierarchical Representations for Anticipative Human-Robot Collaboration" shall be extended to account for segmentation of the scene into individual objects and persons. A network architecture will be developed that models the scene as coherently moving segments that interact and occlude each other. As the future has often multiple plausible developments, the prediction learning framework shall be extended to explicitly account for multimodal distributions of future states. To this end, semantically meaningful latent variables will be learned on which the multimodal future will be conditioned -- without requiring explicit labels. To make the representations targeted to human-robot collaboration, we will fine-tune them for semantic perception and semantic prediction of multiple futures. In correspondence with the spatio-temporal resolutions, coarser semantic concepts, such as larger objects and longer-term activities shall be predicted at the higher layers longer into the future. The learned scene representations and predictions will serve as basis for project P8 "Anticipative Human-Robot Collaboration".
DFG Programme Research Units
 
 

Additional Information

Textvergrößerung und Kontrastanpassung