Project Details
Silent Paralinguistics
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 514018165
Speech is a natural human ability and a core part of what makes us a social species. Concealing the speaker’s lips behind a face mask lowers listener performance and confidence, while increasing perceptual effort. Hearing impaired and non-native-speakers face even greater challenges. Beyond these issues, masks impede the paralinguistics of interpersonal communication, i.e. the way something was said. For acoustic speech, paralinguistics can be automatically recognized using Computational Paralinguistics methods. Silent Speech Interfaces (SSIs) enable spoken communication even when the acoustic signal is severely degraded or unavailable. SSIs aim to generate speech for silent speakers and otherwise mute individuals from biosignals that result from the speech production process itself. Such speech-related biosignals encompass signals from the articulators, articulatory muscle activity, neural pathways and the brain itself. Surface electromyography (EMG), which captures the activity of the articulatory muscles has been successfully applied to SSIs. With EMG-based SSIs, silently spoken speech is converted into text or directly into audible speech. Despite major advances, the lack of paralinguistics remains a major issue for SSI users. In this proposal, we combine Silent Speech Interfaces with Computational Paralinguistics to lay the foundation for “Silent Paralinguistics (SP)”. SP aims to firstly infer speaker states and traits from speech-related biosignals during silent speech production and secondly to use this inferred paralinguistic information for a more natural SSI-based spoken conversation. We will study politeness and frustration as speaker states, as well as identity and personality as speaker traits. As basis for the development of SP methods, we will record and label data from 100 participants, from whom we will elicit polite speech by including game scenarios and frustration by adding infuriating game elements. Based on these data, we will investigate how well speaker states and traits can be predicted from EMG signals of silently produced speech. To this end, we will study and compare two approaches: direct SP, which predicts traits and states directly from the EMG features, and indirect SP, which first converts EMG to acoustic features and then predicts traits and states from the acoustic features. Furthermore, we will optimize the integration of paralinguistic predictions in SSI to generate the most appropriate acoustic signals. Deep generative models for multi-speaker EMG-to-speech conversion will be conditioned on traits and state predictions, such that the produced acoustic signals reflect the intended affective meaning. An EMG-SSI prototype is established to finally validate whether the SP-enhanced acoustic speech signal improves the usability of spoken communication in terms of naturalness and user acceptance.
DFG Programme
Research Grants