Project Details
Projekt Print View

Prosodic structure in audiovisual spoken word recognition

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term from 2006 to 2010
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 25022988
 
Final Report Year 2014

Final Report Abstract

The project investigated the role of visual prosody in audiovisual spoken-word recognition. The results of a first series of experiments showed that Dutch listeners were able to detect suprasegmental lexical stress information from seeing a speaker alone. Only the presence of stress, however, and not its absence can be detected. Listeners were able to distinguish visually whether a syllable had primary stress rather than no stress or secondary stress. The distinction between primary stress and secondary stress was only possible when the target word received phrase-level accent. The distinction between primary stress and no stress was not dependent on phraselevel accent. Seeing a speaker thus contributes to Dutch spoken word recognition by providing suprasegmental information about lexical stress. Secondly, visual speech provides information for the segmentation of words in continuous speech. Dutch listeners were able to reliably detect word boundaries from seeing a speaker. Dutch participants distinguished ambiguous sequences, such as “diep in” and “die pin” (“deep in”, “the pin”), when the juncture phoneme was placed word-initially (as in “die pin”). Visual speech can thus help speech perception by providing structural prosodic information that allows segmenting the continuous speech stream into words. In a third series of experiments, we found that, in word-learning situations, speakers align the motion they impose on an object to the prosodic structure of their accompanying speech. Adult listeners and 24-month-old toddlers can use this audiovisual alignment to detect the intended referent object. In summary, the project deepened our understanding of how seeing a talker can help with the recognition of words by providing prosodic information.

Publications

  • (2007). Audiovisual alignment facilitates the detection of speaker intent in a word-learning setting. Abstracts of the Psychonomic Society, 12, 50
    Johnson, E., & Jesse, A.
  • (2007). Visual lexical stress information in audiovisual spoken-word recognition. In J. Vroomen, M. Swerts, & E. Krahmer (Eds.), Proceedings of the International Conference on Auditory - Visual Speech Processing 2007 (pp. 162-166). Tilburg: Univ. Tilburg
    Jesse, A., & McQueen, J. M.
  • (2008). Audiovisual alignment in child-directed speech facilitates the detection of speaker intent in a word learning setting. In P. Khader et al. (Eds.), Experimentelle Psychologie. Beitraege zur 50. Tagung experimentell arbeitender Psychologen [Proceedings of the 50th Conference of Experimental Psychologists](p. 41), Lengerich, Germany: Pabst Science Publishers
    Jesse, A., & Johnson, E.
  • (2008). Audiovisual alignment in child-directed speech facilitates word learning. Proceedings of the International Conference on Auditory-Visual Speech Processing 2008 (pp. 101-106). Adelaide, Australia: Causal Productions
    Jesse, A., & Johnson, E.
  • Prosodic temporal alignment of co-speech gestures to speech facilitates referent resolution. Journal of Experimental Psychology: Human Perception and Performance, Vol 38(6), Dec 2012, 1567-1581
    Jesse, A., & Johnson, E.
 
 

Additional Information

Textvergrößerung und Kontrastanpassung