Matrix Vektor Multiplikation mit schwach besetzten Matrizen basierend auf Netzwerken-auf-dem-Chip
Final Report Abstract
Sparse Matrix-Vector Multiplication (SMVM) is one of the most challenging kernels in computational science. The implementation of SMVM on parallel platforms has been investigated extensively in the research community. With the increased number of cores in a multicore System-on-Chip (SoC), Networkon-Chip (NoC) was proposed as alternative for shared-buses to provide an efficient communcation infrastructure. The NoC was used to accelerate the SMVM. In order to improve the performance of the SMVM on the NoC, different topologies, data mapping methods and architectures were investigated. The OM- NeT++ based simulator was built for this purpose as performing these investigations on an FPGA is time consuming. One of the main challenges in performing SMVM on a parallel system is to find a data mapping method that achieves both balanced working load and low communcation cost. The proposed methods (Method II for banded and block-diagonal matrices, and Method III as a general purpose method) were designed to reach this goal. Method III was selected as a final mapping method because it does not depend on the sparsity structure of the matrix and does not need any expensive preprocessing steps. This method achieved both balanced working load and low communcation cost. In order to implement Method III, the architecture of the NoC was modified in terms of processing elements and routers. The proposed design avoided the usage of complex routers, which leads to less resources and easier routing decisions. The simulation results showed that the proposed architecture achieved 3-times speedup compared to the previous design. As SMVM is usually done as part of an iterative solver, we benefited from the robustness of the iterative solvers to reduce the computational cost of the SMVM. As the computation and communcation costs depend mainly on the number of nonzeros, the proposed reduction method reduced the number of nonzeros of the sparse matrix while maintaining the convergence of the iterative solver. Overall, performing SMVM on NoC is expected to be used for solving sparse problems (system of equations, eigenvalue problems) in future multicore SoC. By considering the suggested architecture and the incorporation of inexact SMVM, the computation effort of SMVM is minimized regardless of the sparsity structure.
Publications
- An OMNeT++ based network-on-chip simulator for embedded systems. In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2012), pages 364-367, Kaohsiung, Taiwan, December 2012
A. Mansour and J. Götze
- Sparse matrix-vector multiplication based on network-on-chip: On data mapping. International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2012), Taipei, Taiwan, December 2012
A. Mansour and J. Götze
- Sparse matrix-vector multiplication using network-on-chip. 7th International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2012), London, UK, June 2012
A. Mansour and J. Götze
- Utilizing robustness of Krylov subspace methods in reducing the effort of sparse matrix vector multiplication. Procedia Computer Science, 18, pages 2406-2409, 2013
A. Mansour and J. Götze
- Inexact sparse matrix vector multiplication in Krylov subspace methods: An application-oriented reduction method. In Parallel Processing and Applied Mathematics, volume 8384 of Lecture Notes Computer Science, pages 534-544. Springer Berlin Heidelberg, 2014
A. Mansour and J. Götze