Project Details
PriVisSSL: Private Self-Supervised Learning in the Vision Domain
Applicant
Dr. Franziska Boenisch
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 550224287
Self-supervised learning (SSL) has emerged as a novel powerful learning paradigm. In contrast to standard supervised learning that requires data labels, SSL relies on unlabeled data to train performant feature encoders. Thereby, it holds the potential to unlock the value of the large amounts of unlabeled data that have, under the regime of supervised learning, remained unused. This holds especially true in sensitive domains, such as medical imaging or biometrics, where data labels are inherently hard and expensive to obtain. Yet, despite their promising performance, the applicability of SSL encoders in sensitive domains is limited, so far. An important reason is that SSL encoders have been shown to leak sensitive information on their training data—posing a significant privacy risk. To date, no targeted methods to mitigate the risk while maintaining encoder performance exist. In this proposal, we rely on the concept of memorization, i.e., a machine learning model’s capability to store information on its training data to analyze and mitigate privacy leakage in SSL in a structured manner. Therefore, we first provide a fundamental understanding of memorization in SSL and localize where the information on individual training data points are stored within SSL encoders. Then, we quantify the resulting privacy leakage, identify its causes, and formally tie privacy leakage of individual data points to their experienced levels of memorization. Finally, based on our insights on why certain data points experience high memorization and where inside the encoders they are stored, we develop targeted mitigation methods that prevent privacy leakage while still training high-performance encoders. Our approach opens a new path towards deploying high-performance SSL encoders in sensitive domains while safeguarding privacy of their training data.
DFG Programme
Research Grants