TarDiS: Targets and Dynamics in Speech

Lead Research Organisation: University of Manchester
Department Name: Arts Languages and Cultures

Abstract

Speech is a physical phenomenon: it can be represented as a series of vocal tract movements, or as sound waves. However, in order for speech to serve a communicative function, it also must include more abstract components, building blocks that pair these physical phenomena with linguistic meaning. For instance, the vowel /ai/ in a word like 'kite', can be represented in multiple ways. Physiologically, it involves tongue movement from a lower and more far back position to the high front corner of the mouth, but at the same time /ai/ is an abstract category, a family of sounds brought together by their joint meaningful function. The vowel /ai/ is a contrastive unit in English, one that makes words like 'kite' different from words like 'kit'.

Most native speakers of English never need to ponder the nature of the category /ai/, yet they somehow acquire a systematic mapping that links the physical form of /ai/ with its meaning. To linguists, understanding this process holds the key to understanding how language works. It is by this type of mapping that a movement of the tongue acquires linguistic meaning.

This research project sets out to probe the nature of vowel categories like /ai/, and more specifically what their abstract representations must contain in order to successfully model variation observed in speech. This is an old problem in speech science, which we intend to approach using a novel type of data: combination of computational modelling, measurements of tongue movement based on tracking such movement in a magnetic field, and images of tongue acquired with ultrasound, as well as the audio signal. We will record such data from twenty native speakers from a single speech community (West Yorkshire) pronouncing all vowels in their variety of English. We will then consider various abstract features that we can derive from the physical signal, and the minimum number of such features needed to successfully capture all vowel categories. For instance, we want to know whether vowels such as /ai/ are best modelled as consisting of one or two elements, and how much information about movement we need to encode in order to successfully model /ai/ within a vowel system.

We will extend the implications of our findings to practical applications. Although vowel categories and vowel targets do not feature overtly in natural acquisition of language, they become important in situations where explicit speech instruction plays a role, such as speech therapy. The more we understand about the nature of speech sounds, the more successful we can be in teaching them. Therefore, we will present our findings on speech goals from a theoretical modelling perspective to clinical speech researchers and clinical practitioners, and engage them in a discussion of how such abstract speech targets can be used as tools in speech therapy.

Planned Impact

Our research focuses on the nature of vowel categories, and especially the relationship between underlying tongue movement and the resulting sound. A group of users that stands to benefit from this research is SLT (Speech and Language Therapy) practitioners, and ultimately also the clients they treat.

Vowels present a complex case to speech therapy, and an important factors in this is the difficulty of categorising vowels and establishing the relevant parameters for such categorisation (Howard & Heselwood, 2013). A common practice, both in linguistics and in SLT, is to describe vowels in terms of tongue height and tongue position, based on how the vowel sounds, rather than based on direct evidence from articulatory imaging. This is done for practical reasons, but it involves an element of inference that we know to be somewhat inaccurate. The issue is particularly pertinent in attempts to describe and classify non-typical vowel sounds in clinical populations, and additional complications may arise for speakers with non- typical anatomy (e.g. non-typical tongue length, partial glossectomy). It is recognised that diagnosis in such populations could benefit from the use of articulatory imaging, such as ultrasound, but some issues with incorporating this practice include absence of objective criteria for identifying and classifying vowels based on ultrasound tongue images.

By investigating the theory of vowel targets, we hope to inform the practice of describing and classifying vowels in diagnosis and therapy. Since our research will be based on articulatory evidence, including ultrasound, we will explore the limits of using such evidence to define vowel categories bottom-up, and the methods we will use for such description can, in principle, be extended to SLT.

In addition to diagnosis and assessment, research on ultrasound imaging can also benefit the treatment of speech sound disorders. Most SLT practitioners measure the success of interventions based on their auditory impressions. The rationale for this approach stems directly from the goal of speech therapy, i.e. intelligibility. However, arriving at the desired perceptible target may benefit from refocusing on the control structures used to produce speech output, which do not map on to the output in a linear fashion. To put it differently, for some groups of clients, speech therapy may be more beneficial if the focus is on reproducing a specific tongue shape rather than a specific vowel sound. This type of approach has already been trialled with positive results for treatment of consonant disorders with ultrasound biofeedback (Cleland, Scobbie & Wrench, 2015), and it shows promise in helping vowel production in adolescents with hearing loss (Bacsfalvi, Bernhardt & Gick, 2007).

We expect that our project will benefit the SLT community in two ways. The research will contribute to the knowledge base on vowel articulation, vowel movement and articulation-acoustics mapping, all of which have implications for identifying and assessing vowel disorders. We will communicate our results to researchers in speech therapy via the Child Speech Disorder Research Network, who will help us contextualise our findings and make them relevant to speech therapists. This engagement will be facilitated by Dr Joanne Cleland, the current chair of the network, who has agreed to serve on the Advisory Board for our project. In addition, we have planned a series of impact activities directed at speech therapy students that will raise awareness of potential use of ultrasound in the clinic, and also provide practical training and support in using this method. These activities will be built into existing SLT student training at the University of Manchester. By incorporating ultrasound into SLT training, we aim to popularise this approach among clinicians, and to help develop innovations in speech therapy.

Publications

10 25 50