Project Details
Projekt Print View

Resource-Efficient Deep Models for Embedded Systems

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2016 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 285966169
 
Deep representation learning is one of the main factors for the recent performance boost in many image, signal and speech processing problems. This is particularly true when having big amounts of data and almost unlimited computing resources available as demonstrated in competitions such as for example ImageNet. However, in real-world scenarios the computing infrastructure is often restricted and the computational requirements are not fulfilled. In this research proposal we suggest several directions for reducing the computational burden while maintaining the level of recognition performance. Today, advanced embedded CPUs have reached an architectural feature set that supports native cross-compilation of numerical algorithms. One of the most often used embedded CPUs is the ARM Cortex-A9. First hybrid devices integrating CPU and FPGA in a single package are available, providing significant computational performance at very low power budgets for certain tasks. It remains unclear, however, how existing CPU software stacks can be extended to exploit this heterogeneity for improved performance and energy efficiency. To support this heterogeneity in compilation processes, new tools are required.These two research directions in combination enable using deep models in mobile devices and embedded systems with limited power-consumption and computational resources. To achieve this, the focus is four-fold:(1) Sparse connectivity and activity in the models; We aim to use sparse weight matrices and sparsity enforcing activation functions to reduce the number of arithmetic operations.(2) Finite-precision analysis of deep models; In e.g. hearing aids well-performing simple classifiers are necessary for acoustic scene classification. In particular, we perform performance analysis of the classifiers and investigate reduced precision learning behavior. Another interesting aspect is if the models can be scaled to the integer domain requiring only integer arithmetic. Finite-precision analysis determines the optimal bit-width for the arithmetic operations, while still maintaining the performance of the model.(3) Automated code synthesis of the deep models to embedded systems such as hybrid ARM+FPGA architectures based on domain-specific software stacks like Theano. The aim is to exploit sparsity and insights from finite-precision analysis and asynchronous computations to obtain efficient models for such embedded hardware, and to apply automated partitioning, compilation and synthesis techniques to such hybrid architectures.(4) Developed methods are empirically compared in benchmark image classification problems and in two speech processing tasks, i.e.single channel source separation and artificial bandwidth extension. The key properties of interest are reduced-precision behavior, influence of sparsity, power consumption and energy efficiency, and optimized performance on embedded heterogeneous hardware while hiding heterogeneity from the user.
DFG Programme Research Grants
International Connection Austria
 
 

Additional Information

Textvergrößerung und Kontrastanpassung