Speaker discrimination as a function of vowel realization: does focus affect perception?
Keywords:speaker discrimination, focus, vowel quality, speech perception, Dutch
The acoustic-phonetic characteristics of speech sounds are influenced by their linguistic position in the syllable or sentence. Because of acoustic-phonetic differences between different speech sounds, sounds vary in the amount of speaker information they contain. However, do spectral and durational differences between realizations of the same sound that were sampled from different linguistic positions also impact speaker information? We investigated speaker discrimination in [−focus] versus [+focus] word realizations. Twenty-one Dutch listeners participated in a same-different task, using stimuli varying in focus, vowel ([aː], [u]), and word context ([ɦ_k], [v_t]), spoken by 11 different speakers. Results showed that an effect of focus on speaker-dependent information was present, but limited to words containing [u]. Moreover, performance on [u] words was influenced by (interactions of) word context and trial type (same-vs. different-speaker). Context-dependent changes in a speech sound’s acoustics may affect its speaker-dependent information, albeit under specific conditions only.
Akaike, H. (1971). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, & F. Csáki (Eds.), 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8, 1971 (pp. 267–281). Budapest, Akadémiai Kiadó.
Amino, K., & Arai, T. (2007). Contribution of consonants and vowels to the perception of speaker identity. In Japan-China Joint Conference on Acoustics. Sendai, Japan.
Andics, A., McQueen, J. M., & Van Turennout, M. (2007). Phonetic content influences voice discriminability. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (pp. 1829–1832). Dudweiler: Pirrot. [https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_57725]
Baumann, O., & Belin, P. (2010). Perceptual scaling of voice identity: Common dimensions for different vowels and speakers. Psychological Research PRPF, 74(1), 110. [https://link.springer.com/article/10.1007/s00426-008-0185-z]
Bell, A., Brenier, J. M., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60(1), 92–111. https://doi.org/10.1016/j.jml.2008.06.003
Boersma, P., & Weenink, D. (2018). Praat. Doing phonetics by computer (Version 6.0.42 ) [Computer program].
Bricker, P. D., & Pruzansky, S. (1966). Effects of stimulus content and duration on talker identification. The Journal of the Acoustical Society of America, 40, 1441–1449. https://doi.org/10.1121/1.1910246
Chafe, W.L. (1976). Givenness, contrastiveness, definiteness, subjects, topics and point of view. In C.N. Li (Ed.) Subject and topic (pp. 27–55). Academic Press.
Chen, A. (2009). The phonetics of sentence-initial topic and focus in adult and child Dutch. In M. Vigário, S. Frota, & M. João Freitas (Eds.), Phonetics and phonology: interactions and interrelations (pp. 91-106). John Benjamins Publishing Company.
Cook, S., & Wilding, J. (1997). Earwitness testimony: Never mind the variety, hear the length. Applied Cognitive Psychology, 11(2), 95–111. https://doi.org/10.1002/(SICI)1099-0720(199704)11:2<95::AID-ACP429>3.0.CO;2-O
Drozdova, P., Van Hout, R., & Scharenborg, O. (2017). L2 voice recognition: The role of speaker-, listener-, and stimulus-related factors. The Journal of the Acoustical Society of America, 142(5), 3058–3068. https://doi.org/10.1121/1.5010169
Eefting, W. (1991). The effect of “information value” and “accentuation” on the duration of Dutch words, syllables, and segments. The Journal of the Acoustical Society of America, 89(1), 412–424.
Fant, G. (1960). Acoustic theory of speech production. Mouton and Co.
Fejlová, D., Lukeš, D., & Skarnitzl, R. (2013). Formant contours in Czech vowels: Speaker-discriminating potential. Proceedings of Interspeech 2013, 25–29 August 2013, Lyon, France (pp. 3182–3186).
Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behavior Research Methods, 50, 2016–2034. [https://link.springer.com/article/10.3758/s13428-017-0971-x]
Hanssen, J., Peters, J., & Gussenhoven, C. (2008). Prosodic effects of focus in Dutch declaratives. Proceedings of the 4th international conference on Speech Prosody. Campinas, Brazil, pp. 609–612.
He, L., & Dellwo, V. (2017). Between-speaker variability in temporal organizations of intensity contours. The Journal of the Acoustical Society of America, 141(5), EL488–EL494. https://doi.org/10.1121/1.4983398
He, L., Zhang, Y., & Dellwo, V. (2019). Between-speaker variability and temporal organization of the first formant. The Journal of the Acoustical Society of America, 145(3), EL209–EL214. https://doi.org/10.1121/1.5093450
Heeren W. F. L. (2020). The effect of word class on speaker-dependent information in the Standard Dutch vowel /aː/. The Journal of the Acoustical Society of America, 148(4), 2028–2039. https://doi.org/10.1121/10.0002173
Kavanagh, C. (2012). New consonantal acoustic parameters for forensic speaker comparison Doctoral dissertation. University of York. [https://etheses.whiterose.ac.uk/3980/
Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643–650. https://doi.org/10.3758/BRM.42.3.643
Krifka, M. (2007). Basic notions of information structure. In C. Féry, G. Fanselow, & M. Krifka (Eds.), The notions of information structure (pp. 13–55). Universitätsverlag Potsdam.
Lavan, N., Burston, L. F. K., & Garrido, L. (2019). How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. British Journal of Psychology, 110, 576–593. https://doi.org/10.1111/bjop.12348
Lee, Y., Keating, P., & Kreiman, J. (2019). Acoustic voice variation within and between speakers. The Journal of the Acoustical Society of America, 146(3), 1568–1579. https://doi.org/10.1121/1.5125134
McDougall, K. (2006). Dynamic features of speech and the characterization of speakers: Towards a new approach using formant frequencies. International Journal of Speech, Language and the Law, 13(1), 89–126.
Morrison, G. S. (2009). Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. The Journal of the Acoustical Society of America, 125(4), 2387–2397. https://doi.org/10.1121/1.3081384
Narayan, C. R., Mak, L., & Bialystok, E. (2017). Words get in the way: Linguistic effects on talker discrimination. Cognitive Science, 41(5), 1361–1376. https://doi.org/10.1111/cogs.12396
Orchard, T., & Yarmey, A. D. (1995). The effects of whispers, voice‐sample duration, and voice distinctiveness on criminal speaker identification, Applied Cognitive Psychology, 9(3), 249–260. https://doi.org/10.1002/acp.2350090306
Psychology Software Tools. (2012). E-Prime (Version 2.0). https://www.pstnet.com
Schindler, C., & Draxler, C. (2013) Using spectral moments as a speaker specific feature in nasals and fricatives. Proceedings of Interspeech (pp. 2793–2796), Lyon, France, 25–29 August 2013.
Sluijter, A. M. C., & Van Heuven, V. J. (1996). Spectral balance as an acoustic correlate of linguistic stress. The Journal of the Acoustical Society of America, 100, 2471–2485. https://doi.org/10.1121/1.417955
Smorenburg, B. J. L., & Heeren, W. F. L. (2020). The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues. Journal of the Acoustical Society of America, 147(2), 949–960. https://doi.org/10.1121/10.0000674
Smorenburg, B. J. L., & Heeren W. F. L. (2021). Acoustic and speaker variation in Dutch /n/ and /m/ as a function of phonetic context and syllabic position. The Journal of the Acoustical Society of America, 150(2), 979–989. https://doi.org/10.1121/10.0005845
Stevenage, S. V. (2018). Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia, 116, 162–178. https://doi.org/10.1016/j.neuropsychologia.2017.07.005
Tagliamonte, S. A., & Baayen, R. H. (2012). Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change, 24, 135–178. [https://clarinoai.informatik.uni-leipzig.de/fedora/objects/oai:mrr/datastreams/info/content]
Van Bergem, D. R. (1993). Acoustic vowel reduction as a function of sentence accent, word stress, and word class. Speech Communication, 12(1), 1–23. https://doi.org/10.1016/0167-6393(93)90015-D
Van Bergem, D. R. (1995). Acoustic and lexical vowel reduction. PhD dissertation, University of Amsterdam. [https://dare.uva.nl/search?identifier=6ba47af3-8bf4-4b46-81cb-2adb65dbc955]
Van Berkum, J. J., Van den Brink, D., Tesink, C. M., Kos, M., & Hagoort, P. (2008). The neural integration of speaker and message. Journal of Cognitive Neuroscience, 20(4), 580–591. https://doi.org/10.1162/jocn.2008.20054
Van den Heuvel, H. (1996). Speaker variability in acoustic properties of Dutch phoneme realisations. Doctoral dissertation, Katholieke Universiteit Nijmegen. [https://repository.ubn.ru.nl/bitstream/handle/2066/76416/76416.pdf
Van Heuven, V. J. (1997). Effects of focus distribution and accentuation on the temporal and melodic organisation of word groups in Dutch. In S. Barbiers, J. Rooryck, & J. van de Weijer (Eds.), Small words in the big picture. Squibs for Hans Bennis. HIL Occasional Papers no. 2 Leiden: Holland Institute of Generative Linguistics. 37–42.
Voeten, C. C. (2020). buildmer: Stepwise elimination and term reordering for mixed-effects regression. R package version 1.5. https://CRAN.R-project.org/package=buildmer
How to Cite
Copyright (c) 2022 Willemijn Heeren, Cesko Voeten, Tessi Marks
This work is licensed under a Creative Commons Attribution 4.0 International License.