Project Details
Multilingual Controllable Voice Privacy (VoiPy)
Applicant
Professor Dr. Ngoc Thang Vu
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 533241795
Automatic speech processing enables useful applications, like speech assistants or automatic transcription services. However, a speech signal contains more data than is usually needed for the task, such as paralinguistic information about the speaker. As classification models with objectives like speaker identification or speech emotion recognition become advanced, this information poses serious privacy threats, often without the speaker’s knowledge. Service providers might abuse the data for speaker profiling, or it might fall into the hands of attackers during data transmission or storage. Therefore, anonymizing the audio - to preserve voice privacy - immediately after recording and before further processing has increased its relevance and importance, especially after the introduction of the General Data Protection Regulation (GDPR) of the European Union. The idea is to modify speech recordings such that the link between the original speaker and the audio is destroyed, for instance, by using the voice of a different speaker. In theory, if the anonymization has been successful, no further actions have to be taken to protect other personal attributes in the data, like speaker traits (e.g., gender, ethnic origin) or speaker states (e.g., health state, emotions). In this project, we propose a voice privacy framework that gives users control over which personal information, including but not limited to their identity, they want to protect in their speech before sharing it with an external service. The user can flexibly set anonymization requests for different speaker attributes related to their profile (e.g., age, gender) and state (e.g., emotion). We will not anonymize all personal information by default because, depending on the application of the anonymized audio, some attributes need to be preserved in an unmodified form. Especially since users of smart devices often do not know about the protection or violation of their privacy, we focus on providing as much controllability and transparency to the user as possible while keeping usability and effectiveness. Furthermore, to extend privacy support to non-English speakers, we include a multilingual switch in our proposed system that selects language-dependent components and informs multilingual components about the input language. In this proposal, we will focus on the major languages spoken in Germany (official or foreign) but propose a method that is extendable to other languages.
DFG Programme
Research Grants