Project Details
Parallel Support Vector Machine Training on a Budget
Applicant
Professor Dr. Tobias Glasmachers
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2019 to 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 418003699
Maschine learning is concerned with the data-driven and hence fullyautomated construction of predictive models. The field connects elementsof statistics, computer science, and optimization. Support vectormachines (SVMs) are one of the standard methods, in particular forclassification problems. They are applied in all areas of science andtechnology, e.g., in bioinformatics, robotics, medical diagnostics andtext analysis.Accelerated SVM training.Training an non-linear SVM amounts to solving a large scale optimizationproblem. Its number of variables coincides with the number of datapoints. With many millions of points, machine training becomes acomputationally extremely demanding task.The computational bottleneck is closely related to the unbounded growthof the predictive model and its evaluation cost with the data set size.To mitigate this problem, a wide variety of approximate training schemeswas proposed in the literature. Among these, the budget method isparticularly promising. By limiting the model size to an a-prioridefined limit it guarantees a bounded evaluation cost. It still achievesexcellent prediction accuracy due to its data-adaptive and highlyflexible representation of the solution.This way it retains the expressive power of a kernel method. We haverecently developed the first dual decomposition algorithm with budget,resulting in a significant speed-up over the state of the art.A different route to fast SVM training is parallelization. Despite manyparallel training schemes proposed over the years it was only veryrecently that SVM training was parallelized in a convincing manner: theThunderSVM solver achieves speed-ups of more than two orders ofmagnitude by using modern graphics processing units (GPU). This is inline with the general trend of leveraging the massive computing powerGPUs for machine learning. This hardware platform is already theworkhorse of the field, a trend that can safely be projected into the(foreseeable) future.Project goals.A central goal of the project is to combine these recent successes intoa new SVM training algorithm. The new method will use the parallelapproach of the ThunderSVM algorithm while operating on a budget as itis done in our dual budget algorithm. When executing the fast iterationsof the budget solver in a highly parallel manner on a GPU, then thespeed-ups should multiply. While this ideal result would be slightlyover-optimistic, we expect to come close, resulting in very substantialspeed-ups over present budget solvers as well as over the non-budgetedThunderSVM solver.A further goal is the incorporation of the established speed-uptechniques of online problem shrinking and a kernel cache into the(parallel) budget solver.We furthermore aim to provide theoretical guarantees for our algorithm.Maybe most importantly, we will provide highly tuned open sourceimplementations specialized for high-end GPU hardware.
DFG Programme
Research Grants