Charting lexical development through dense coding and analysis of word senses
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Philosophy Psychology & Language
Abstract
Children learn words at a remarkable rate, which is all the more striking because most words are highly ambiguous. Specifically, most words are "polysemous", which means that they can be used in a range of distinct but related senses. For instance, think of all the different ways that the word "run" can be used: You can run a race, run a car, run a company, run some tests, or even run a bath. These flexible uses of words provide us with a vivid expressive power -- we can use a word with one meaning while alluding to all the rest -- but most researchers have tended to assume that polysemy also makes words hard to learn. In particular, when a word's meaning changes across uses, then how are children ever supposed to acquire it?
We have argued that this assumption is backwards, and that flexible uses of words actually help children to acquire a vocabulary. That is because the senses of words are related to one another (e.g., there is an obvious link between the senses of run in "run a car" versus "run a company"), and so once a child has learned one sense of a word, then they can use that knowledge as a supportive platform to more easily infer new senses. By contrast, if all words were unambiguous, or all words had unrelated ambiguous meanings, then each new meaning would have to be learned without any support. Thus, while polysemy may seem confusing from the outside, it is actually an opportunity for learning, when seen from the perspective of a child.
This project tests three implications of this idea. First, for polysemy to help learning, then we expect that it should be plentiful in the input that children receive. However, we do not know how much polysemy children hear. Indeed, it is possible that caregivers implicitly avoid polysemy, particularly with younger children, if they tend to say simple and unambiguous words.
Second, if polysemy indeed helps learning, then we should expect that children also use polysemy, even in the first words that they say. Finally, if polysemy helps learning, then we should expect that it will be easier for children to learn new senses for words, than to learn entirely new words.
To test these implications, we need a database of how caregivers and children use polysemous words, that specifies which senses are used when. However, while there are existing databases of how caregivers and children interact, they never specify the different senses in which words are used. For instance, when "run" is used, the corpora never explicitly specify with which meaning. Hence, quantitative analyses of polysemy are currently impossible.
Thus, the first aim of this research is to recode these databases of child language in terms of word senses. To do this, we will use a new toolkit that our group has developed, and we will apply it to both English and French data. The output will be (to our knowledge) the largest sense-annotated database yet constructed, not only in terms of child language, but in terms of human conversations more broadly.
Using this database, we will then examine how caregivers use polysemy when talking to their children, and how children use polysemy in their earliest language use. We will build novel statistical models that assess questions such as whether parents avoid using polysemy when talking to their youngest children, and whether children find it easier to either use new senses for old words, or learn entirely new words altogether.
These data and analyses should provide the clearest picture to date of how children acquire polysemy, with implications for theories of language development, theories from linguistics, and for our understanding of how caregivers interact with their children. We will make our tools and databases publicly available, and describe our results in scholarly journals.
We have argued that this assumption is backwards, and that flexible uses of words actually help children to acquire a vocabulary. That is because the senses of words are related to one another (e.g., there is an obvious link between the senses of run in "run a car" versus "run a company"), and so once a child has learned one sense of a word, then they can use that knowledge as a supportive platform to more easily infer new senses. By contrast, if all words were unambiguous, or all words had unrelated ambiguous meanings, then each new meaning would have to be learned without any support. Thus, while polysemy may seem confusing from the outside, it is actually an opportunity for learning, when seen from the perspective of a child.
This project tests three implications of this idea. First, for polysemy to help learning, then we expect that it should be plentiful in the input that children receive. However, we do not know how much polysemy children hear. Indeed, it is possible that caregivers implicitly avoid polysemy, particularly with younger children, if they tend to say simple and unambiguous words.
Second, if polysemy indeed helps learning, then we should expect that children also use polysemy, even in the first words that they say. Finally, if polysemy helps learning, then we should expect that it will be easier for children to learn new senses for words, than to learn entirely new words.
To test these implications, we need a database of how caregivers and children use polysemous words, that specifies which senses are used when. However, while there are existing databases of how caregivers and children interact, they never specify the different senses in which words are used. For instance, when "run" is used, the corpora never explicitly specify with which meaning. Hence, quantitative analyses of polysemy are currently impossible.
Thus, the first aim of this research is to recode these databases of child language in terms of word senses. To do this, we will use a new toolkit that our group has developed, and we will apply it to both English and French data. The output will be (to our knowledge) the largest sense-annotated database yet constructed, not only in terms of child language, but in terms of human conversations more broadly.
Using this database, we will then examine how caregivers use polysemy when talking to their children, and how children use polysemy in their earliest language use. We will build novel statistical models that assess questions such as whether parents avoid using polysemy when talking to their youngest children, and whether children find it easier to either use new senses for old words, or learn entirely new words altogether.
These data and analyses should provide the clearest picture to date of how children acquire polysemy, with implications for theories of language development, theories from linguistics, and for our understanding of how caregivers interact with their children. We will make our tools and databases publicly available, and describe our results in scholarly journals.
Organisations
Publications
Brough J
(2024)
Cognitive causes of 'like me' race and gender biases in human language production.
in Nature human behaviour
Skarabela B
(2023)
Learning Dimensions of Meaning: Children's Acquisition of But
Skarabela B
(2023)
Learning dimensions of meaning: Children's acquisition of but.
in Cognitive psychology
| Description | In this project, we annotated the meanings of words that children hear and use. Prior investigation of how children learn words has focused on what forms (i.e., sounds) they use, rather than investigating the meanings of these words. Our achievements and discoveries through this project include: 1. We created by far the largest human-checked meaning-annotated corpus of conversations, particularly for conversations involving children. In total we annotated the meanings of more than one million words. This served as a base for our investigations of how children learn and use meanings, and will be made available soon to other researchers in cognitive and computational language sciences. 2. We made a number of discoveries about how children learn word senses. For instance, we found that children use a broad array of senses from early in life, but still use words in less ambiguous ways than adults. 3. We found that parents do not simplify the meanings of the words that they use with children. 4. Contra to prior theorising, we found that children rarely over-extend the meanings of words (i.e., using words with unusual meanings), and that they do so at about the same rate as adults. 5. We annotated how certain abstract words (connectives like "but") are used in both speech to children and literature for children, finding that their meanings are very distinct in these two contexts. This may explain why children often struggle to master connectives in school. |
| Exploitation Route | Our database of annotated meanings will be useful both for other researchers and for those in industry who are generating linguistic tools for use by children. Our comparison of meanings in speech vs literature will be useful for researcher, but also for educators who are trying to understand why their young students may struggle to master meanings that they take for granted. |
| Sectors | Digital/Communication/Information Technologies (including Software) Education Other |
| Description | The non-academic impacts of this award have been focused on early years work. To boost the impact of this award, the PDRA was able to gather an ESRC Impact Accelerator grant, which was used to host events for practitioners, children and families designed to highlight early years language, and to develop a new app designed to highlight how parents can facilitate early language learning. The practitioner events (e.g., presentations to the Scottish Book Trust) focused on how talking to children is known to enhance their language development. The app is currently in alpha testing and thus not yet contributing societal impact, but we hope it to be released soon. |
| First Year Of Impact | 2023 |
| Impact Types | Cultural |
| Description | The Power of Words: The importance of boosting children's vocabulary size in the preschool years |
| Amount | £35,000 (GBP) |
| Organisation | Economic and Social Research Council |
| Sector | Public |
| Country | United Kingdom |
| Start | 06/2022 |
| End | 03/2023 |
| Description | The Power of Words: Words of Music. |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Other audiences |
| Results and Impact | Day long workshops for parents and caregivers at the St Cecilia's music hall, highlighting how words and metaphors impact our understanding of music, and doing collaborative early years Bookbug sessions on the importance of language and early reading. |
| Year(s) Of Engagement Activity | 2022 |