Project Details
Projekt Print View

Bayesian Learning of a Hierarchical Representation of Language from Raw Speech

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2014 to 2018
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 260050394
 
The goal of this project is to learn a hierarchical representation of a language from raw speech input. At the lowest layer the acoustic building blocks of speech, the phonemes or similar sub-word units, are discovered and statistical models learned for them, while the next layer is concerned with the discovery of the lexical building blocks, the words, and learning their probabilities. Finally, semantically interpretable word categories will be built from them. Particular focus is placed on the facts, that the vocabulary in natural languages is in principle unlimited and that the acoustic signal exhibits extreme variability. Both issues will be approached in a Bayesian paradigm. To allow the vocabulary to grow with the amount of input data nonparametric Bayesian statistics, in particular those based on the Dirichlet and the Pitman-Yor processes, will be employed, where the number of parameters need not be specified in advance. The variability of the spoken input, leading to ambiguities at the acoustic unit discovery stage, is accounted for by avoiding premature decisions on the phoneme identity and treating acoustic and lexical variables in a joint probabilistic model, for which efficient inference techniques will be developed. While we envision several applications in the realm of speech processing, the joint or iterative hierarchical Bayesian learning framework to be developed will also be of interest to learning problems on other sequential and highly variable sensory input data.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung