Quantifizierung des Trade-Offs zwischen Energie und Berechnungsgenauigkeit in Computer Vision Prozessorarchitekturen erweitert mit stochastischen Berechnungsmechanismen
Rechnerarchitektur, eingebettete und massiv parallele Systeme
Zusammenfassung der Projektergebnisse
In this project, the energy-accuracy trade-offs of two different processor architecture organizations, a horizontal micro-SIMD processor and a vertical vector SIMD processor, are explored quantitatively. These processor architectures are not only optimized for a specic computer vision application, but also enhanced with approximate and stochastic computing mechanisms. A complete analysis of stochastic and approximate techniques in vision processors has been performed and the potential of the different techniques depending on the employed micro-architecture has been compared. The proposed architectures are specialized for computer vision applications, e.g., image feature extraction, by exploiting inherent data-level parallelism. In the case studies performed in this project, the vertical architectures required up to 2.1x larger silicon area than horizontal architectures with identical ALU resources. This is because the level of parallelism is increased by replicating vertical vector unit resources including distributed memories, which results in a large circuit area. In contrast, the centralized memories of the horizontal architecture are not required to be increased in data size. For SIFT image feature extraction, however, the vertical vector SIMD processors achieve up to 2.7x higher performance, since the horizontal micro-SIMD architecture is limited by the instruction issuing throughput of the scalar main processor and data reordering overhead, whereas the vertical vector SIMD processor allows a high clock frequency and exible data memory accesses. When comparing performance-area-energy eciency metrics, the vertical architectures achieve up to 3.9x more ecient SIFT execution than the horizontal architectures. These processor architectures can be enhanced with approximate and stochastic computing units, like arithmetic units. During the execution of the project, different adder and multiplier structures for approximate and stochastic processors were evaluated. The VHDL code of the implemented and evaluated designs is going to be available to the public by means of an open-source library (www.ids.uni-bremen.de/repostoch/). A new proposed approximate adder Optimized Lower-Part Constant-OR Adder (OLOCA) improves the mean squared error by 58% at a 13.8% lower area-delay product compared to a state-of-the-art Lower-Part OR Adder (LOA). Since current error metrics were found to be misleading when applied to differing classes of errors, emphasis has been put on creating a combined Saturated Mean Squared Error (SMSE) metric that allows a fair comparison of both approximate and stochastic error effects throughout the evaluated design library. In order to avoid slow gate-level simulations to characterize the stochastic behavior of an arithmetic circuit, an FPGA-based timing analysis framework (FLINT+) is proposed to accelerate the analysis of stochastic mechanisms by a speed-up factor of up to 476x. The processor architectures enhanced with approximate ALUs have been analyzed regarding their energy-accuracy trade-off for an egomotion estimation application. For this application, SIFT image features are matched and traced in stereoscopic camera video sequences from a vehicle to obtain an estimation of the vehicle movement. Early results demonstrate that the use of an approximative multiplier, i.e., an accuracy-congurable Broken-Array Multiplier (BAM), can reduce the datapath power consumption up to 23.3% for the horizontal processor datapath and up to 8.9% for the vertical processor datapath, while maintaining a similar estimation accuracy compared to GPS reference data. The results highlight the advantages of stochastic and approximate techniques for the improvement of vision processors. However, an integral optimization in all processor components will be necessary to exploit the full potential, including the interconnect architecture, memory, and control logic. Moreover, it is shown that different sections of Computer Vision algorithms have varying accuracy requirements. Due to that, approximate and stochastic processors will require mechanisms for accuracy reconguration to access the full energy-accuracy trade-off potential of an application.
Projektbezogene Publikationen (Auswahl)
- Misalignment-aware delay modeling of narrow on-chip interconnects considering variability. In: 2018 7th InternationalConference on Modern Circuits and Systems Technologies (MOCAST). 2018, pp. 1-4
A. Najafi, L. Bamberg, and A. Garcia-Ortiz
(Siehe online unter https://doi.org/10.1109/MOCAST.2018.8376593) - A fair comparison of adders in stochastic regime. In: 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). 2017, pp. 1-6
A. Najafi, M. Weissbrich, G. Paya Vaya, and A. Garcia-Ortiz
(Siehe online unter https://doi.org/10.1109/ PATMOS.2017.8106990) - FLINT+: A runtime-congurable emulation-based stochastic timing analysis framework. In: 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). 2017, pp. 1-8
M. Weissbrich, G. Paya-Vaya, L. Gerlach, H. Blume, A. Najafi, and A. García-Ortiz
(Siehe online unter https://doi.org/10.1109/PATMOS.2017.810695) - FPGA Emulation Methodology for Fast and Accurate Power Estimation of Embedded Processors. In: Journal of Systems Architecture 77 (2017), pp. 14-25
S. Hesselbarth, G. Schewior, and H. Blume
(Siehe online unter https://doi.org/10.1016/j.sysarc.2016.12.008) - ATE-Accuracy Trade-Offs for Approximate Adders and Multipliers in Pipelined Processor Datapaths. In: 2018 Third Workshop on Approximate Computing
M. Weissbrich, A. Najafi, A. Garcia-Ortiz, and G. Paya-Vaya
- Coherent Design of Hybrid Approximate Adders: Unied Design Framework and Metrics. In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8.4 (2018), pp. 736-745
A. Najafi, M. Weissbrich, G. Paya-Vaya, and A. Garcia-Ortiz
(Siehe online unter https://doi.org/10.1109/JETCAS.2018.2833284) - Systematic Design of an Approximate Adder: The Optimized Lower Part Constant-OR Adder. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26.8 (2018), pp. 1595-1599
A. Dalloo, A. Najafi, and A. Garcia-Ortiz
(Siehe online unter https://doi.org/10.1109/TVLSI.2018.2822278) - A Coding Approach to Improve the Energy Effciency of Approximate NoCs. 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). 2019, pp. 1-4
A. Najafi, L. Bamberg, G. Paya Vaya, and A. Garcia-Ortiz
(Siehe online unter https://doi.org/10.1109/ReCoSoC48741.2019.9034965) - FLINT+: A Runtime-Congurable Emulation-Based Stochastic Timing Analysis Framework. In: Integration (2019)
M. Weissbrich, G. Paya-Vaya, L. Gerlach, H. Blume, A. Najafi, and A. García-Ortiz
(Siehe online unter https://doi.org/10.1016/j.vlsi.2019.01.002)