Systematicity and Variation in Word Structure Processing Across Languages: a Neuro-Typology approach (SAVANT)
Lead Research Organisation:
Queen Mary University of London
Department Name: School of Languages Linguistics and Film
Abstract
In general, research on how the human brain processes language has mostly focused on a very small set of familiar, related European languages like English, Dutch, German, Spanish and French. We know almost nothing about how the brains of speakers of most of the world's languages respond to even simple tasks like processing a single word. Not only does this limit our scientific knowledge, it also has the effect of making only a few languages seem worthy of scientific study.
Speakers of languages know many things about their language that they have never been taught, and probably do not even know that they know. For example, speakers of English know that the prefix re- attaches to verbs (eg. refill, repaint). Attaching re- to nouns creates impossible words: reidea, resofa. Speakers of English also know that re- can only attach to specific kinds of verbs like 'fill' and 'open' that describe an event which causes a change of state (fill = cause to become filled). If we attach re- to verbs which do not have the right meaning, the result is not a possible word of English: reknow, resmile. Every human language has rules like this which place limits on the way pieces of the language can be combined to make new words. A combination that is impossible in English, like relaugh, is perfectly fine in French (resourir) or Greek (xanagelo).
We can test people's unconscious knowledge of these kinds of restrictions with a very simple experiment, in which we show speakers real English words, and impossible words like reknow, one word at a time, and ask them to say judge how good a word it is. As people are doing this experiment, we use neuroimaging equipment to record their brain activity to find out how their brains process these impossible words. We have found that both English and Greek speakers' brains produce the same patterns of activity. When speakers of these two languages read a word that combines a prefix or suffix with a stem of the wrong category (eg. reidea), their brains produce more activity in a region in the left temporal lobe about 200ms after they first see the word. But Greek and English speakers produce a different response when they read a word that combines a prefix or suffix with a stem that is the right category, but has the wrong semantics (eg. reknow): in this case, their brains produce more activity about 450ms after they first see the word, and the response comes from the frontal lobe. Just from these two languages, it seems that human brains have the same kinds of responses in the same locations, and at the same times, to similar kinds of linguistic information across different languages. But Greek and English are only two of thousands of languages.
Our project will test speakers from a wider range of different languages, which have a range of different word formation rules and processes. These languages will include other Indo-European languages like Slovenian and Bosnian/Croatian/Serbian, in which verbs usually have four or five separate pieces (morphemes), to mark things like tense, aspect, and the person, number and gender of the subject; and Bangla, a language in which verbs often change their pronunciation to mark grammatical features (like English sing~sang or write~written). We will also include Arabic, in which words are made by combining a 3 consonant root like KTB with different vowel 'melodies' (eg. kitab = book, kaatib = writer, katab = 'to read'), and Tagalog, a language in which words can be made by doubling part of the word (eg. 'halo' = 'a mix', 'hahalo' = 'to mix'), or by inserting an affix into the middle of the word (eg. 'h-in-alo' = 'it was mixed'). We'll use the same simple experiment to show speakers of these languages words which break either a category rule (reidea) or a semantic rule (unsmile) to see whether the brain responses we observed for English and Greek are truly universal, and how different word-formation processes might use the same basic language network in different ways.
Speakers of languages know many things about their language that they have never been taught, and probably do not even know that they know. For example, speakers of English know that the prefix re- attaches to verbs (eg. refill, repaint). Attaching re- to nouns creates impossible words: reidea, resofa. Speakers of English also know that re- can only attach to specific kinds of verbs like 'fill' and 'open' that describe an event which causes a change of state (fill = cause to become filled). If we attach re- to verbs which do not have the right meaning, the result is not a possible word of English: reknow, resmile. Every human language has rules like this which place limits on the way pieces of the language can be combined to make new words. A combination that is impossible in English, like relaugh, is perfectly fine in French (resourir) or Greek (xanagelo).
We can test people's unconscious knowledge of these kinds of restrictions with a very simple experiment, in which we show speakers real English words, and impossible words like reknow, one word at a time, and ask them to say judge how good a word it is. As people are doing this experiment, we use neuroimaging equipment to record their brain activity to find out how their brains process these impossible words. We have found that both English and Greek speakers' brains produce the same patterns of activity. When speakers of these two languages read a word that combines a prefix or suffix with a stem of the wrong category (eg. reidea), their brains produce more activity in a region in the left temporal lobe about 200ms after they first see the word. But Greek and English speakers produce a different response when they read a word that combines a prefix or suffix with a stem that is the right category, but has the wrong semantics (eg. reknow): in this case, their brains produce more activity about 450ms after they first see the word, and the response comes from the frontal lobe. Just from these two languages, it seems that human brains have the same kinds of responses in the same locations, and at the same times, to similar kinds of linguistic information across different languages. But Greek and English are only two of thousands of languages.
Our project will test speakers from a wider range of different languages, which have a range of different word formation rules and processes. These languages will include other Indo-European languages like Slovenian and Bosnian/Croatian/Serbian, in which verbs usually have four or five separate pieces (morphemes), to mark things like tense, aspect, and the person, number and gender of the subject; and Bangla, a language in which verbs often change their pronunciation to mark grammatical features (like English sing~sang or write~written). We will also include Arabic, in which words are made by combining a 3 consonant root like KTB with different vowel 'melodies' (eg. kitab = book, kaatib = writer, katab = 'to read'), and Tagalog, a language in which words can be made by doubling part of the word (eg. 'halo' = 'a mix', 'hahalo' = 'to mix'), or by inserting an affix into the middle of the word (eg. 'h-in-alo' = 'it was mixed'). We'll use the same simple experiment to show speakers of these languages words which break either a category rule (reidea) or a semantic rule (unsmile) to see whether the brain responses we observed for English and Greek are truly universal, and how different word-formation processes might use the same basic language network in different ways.
Planned Impact
This project investigates how speakers of a diverse range of languages solve the basic problem of detecting, recognising and interpreting constituent pieces of complex words, by recording their brain activity while they read and judge the wellformedness of familiar and novel words in their language. By employing a very simple paradigm, that can be replicated across all the languages in our sample, we can both better understand the shared neurocognitive bases for the human language capacity, while also uncovering the neurobiological basis for the distribution of different linguistic patterns across the languages of the world. The systematic comparison of the responses evoked by the same manipulations across a range of languages will lead to new discoveries and to the refinement of existing models of how word-internal linguistic structure is parsed.
In addition to addressing both big picture and fine-grained questions about the representation and processing of complex words, our project has a fundamental capacity building goal. We can only enrich the set of languages for which there is neurolinguistic data by a small number in this project. But we can enable a much larger set of languages and languages speakers to be investigated, (a) by training our postdocs, postgraduate and undergraduate research assistants, and, through a summer school program, local secondary school students, all of whom will be speakers of under-investigated languages, and (b) by developing the kinds of databases that are essential for language science research and clinical applications, and training others to build their own. The program to give UG and secondary school students the skills and motivation to investigate the languages in their lives critically includes public showcase events, co-organised with the advice and advocacy organisation Bilingualism Matters (in London) and the science communication space HiSa Experimentov (House of Experiments) (in Ljubljana).
Significantly widening the range of languages investigated by neuroscience in this way not only ensures that our theory building and testing is much more representative of the reality of linguistic experience across the globe, it also firmly situates the native speaker researchers and participants as stakeholders with a vested interest in research that replaces cultural insecurity and marginalisation with a deep sense of pride and confidence, enhances the prestige of their languages and celebrates their linguistic capacities and the value of multilinguistic communities.
In addition to addressing both big picture and fine-grained questions about the representation and processing of complex words, our project has a fundamental capacity building goal. We can only enrich the set of languages for which there is neurolinguistic data by a small number in this project. But we can enable a much larger set of languages and languages speakers to be investigated, (a) by training our postdocs, postgraduate and undergraduate research assistants, and, through a summer school program, local secondary school students, all of whom will be speakers of under-investigated languages, and (b) by developing the kinds of databases that are essential for language science research and clinical applications, and training others to build their own. The program to give UG and secondary school students the skills and motivation to investigate the languages in their lives critically includes public showcase events, co-organised with the advice and advocacy organisation Bilingualism Matters (in London) and the science communication space HiSa Experimentov (House of Experiments) (in Ljubljana).
Significantly widening the range of languages investigated by neuroscience in this way not only ensures that our theory building and testing is much more representative of the reality of linguistic experience across the globe, it also firmly situates the native speaker researchers and participants as stakeholders with a vested interest in research that replaces cultural insecurity and marginalisation with a deep sense of pride and confidence, enhances the prestige of their languages and celebrates their linguistic capacities and the value of multilinguistic communities.
Publications
Cayado D
(2023)
Does linear position matter for morphological processing? Evidence from a Tagalog masked priming experiment
in Language, Cognition and Neuroscience
Cayado DKT
(2024)
MEG evidence for left temporal and orbitofrontal involvement in breaking down inflected words and putting the pieces back together.
in Cortex; a journal devoted to the study of the nervous system and behavior
Matar S
(2025)
Neural Bases of Proactive and Predictive Processing of Meaningful Subword Units in Speech Comprehension
in The Journal of Neuroscience
Moitra S
(2024)
How long is long? Word length effects in reading correspond to minimal graphemic units: An MEG study in Bangla.
in PloS one
Wray S
(2022)
Early Form-Based Morphological Decomposition in Tagalog: MEG Evidence from Reduplication, Infixation, and Circumfixation.
in Neurobiology of language (Cambridge, Mass.)
| Description | This project set out to systematically investigate the early stages of processing morphologically complex words (eg. refill, fundable) across a diverse range of languages, using a combination of behavioural and neural measurements. Our goal is to understand how knowledge about the units and processes involved in recognizing and understanding complex words is stored and accessed in the human brain. Previous research by our team had compared English and Greek, and found broadly similar patterns. We have now collected data from Slovenian, Bosnian/Serbian/Croatian (BCS), Arabic, Bangla and Tagalog, and compared a range of different kinds of complex words across these languages. We have found encouraging similarities across all these languages, confirming a core hypothesis of the generative approach to human language, namely that despite great variation in the precise forms that words and pieces of words take across the world's languages, they all share core basic mechanisms for building and interpreting complex linguistic structures (in our case words). But we have also found intriguing differences. In Bangla, for instance, we found that participants engaged their right hemisheres to assess the grammatical wellformedness of complex words, while in all other languages in our sample (and in the broader literature), this process is associated with the left hemisphere. An explanation for this will require further experimentation, and will enrich our models of the neural bases of language. In Slovenian and BCS, we have found that grammatical wellformedness assessment begins much earlier in the brain than in other languages, reflecting a difference in how grammatical category is signaled in the word in these languages - this is not unexpected, but had not previously been observed as this feature does not exist in the previously studied languages. Additional interesting differences from the general pattern have been found for Tagalog and Arabic, each of which adds nuance to our models and provokes interesting questions. As hoped, therefore, this project has succeeded in finding both systematicity and variation across the languages in our sample. Full interpretation is ongoing. |
| Exploitation Route | Yes. Each of our studies has revealed results that evoke many further questions, and that have consequences for our models of the neural bases of the human linguistic capacity. The stimulus sets we have developed are all being made available as part of publishing the studies, and thus are/will be available for use in replication or adaptation. Likewise the behavioural and MEG data sets are being made available, allowing other researchers to conduct their own analyses, or to use them as norms for the development of assessment tools (either for education or for clinical purposes). |
| Sectors | Education |
| URL | https://savant.qmul.ac.uk/output/ |
| Description | HSS Research Bursary Scheme |
| Amount | £1,000 (GBP) |
| Organisation | Queen Mary University of London |
| Sector | Academic/University |
| Country | United Kingdom |
| Start | 01/2023 |
| End | 04/2023 |
| Description | Queen Mary University of London: - HSS Research Bursary Scheme (£ 1000; 2024 - 2025) |
| Amount | £1,000 (GBP) |
| Organisation | Queen Mary University of London |
| Sector | Academic/University |
| Country | United Kingdom |
| Start | 03/2025 |
| End | 07/2025 |
| Title | LexiVault |
| Description | Lexivault is a repository and web-tool for psycholinguistic lexicons of lesser-studied languages. Investigating psycholinguistic questions relies on fine-grained, corpus-derived measures. Psycholinguistic research is hampered by lack of computational resources for most of the world's languages. Existing resources fall short of fully satisfying desired requirements for: Accessibility Broad language support for psycholinguistic research Minimizing resource building efforts Current goal: Build and collect resources that are structured for psycholinguistic inquiry of lesser-studied languages So we set up a baseline process and structure to follow as we build each of the language resources on our docket and beyond for eventual contributors to keep enriching LexiVault while still following a set of guidelines to keep the data relevant for psycholinguistic research, namely: A sufficiently representative corpus in size and content to support significance of findings, and specific measures typically used in psycholinguistic studies, such as normalized frequencies, transition probabilities, and grapheme-to-phoneme transcription which typically doesn't co-occur with morphological or lexical annotation in a same resource, but still with a flexible enough format to accommodate a diverse range of lesser-resourced languages in terms of bare-bones universal features like word tokens and their frequencies, or for instance using IPA to represent phonetic information and the option to extend certain datasets to add language-specific attributes like roots and patterns for semitic languages. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | No notable impacts yet outside of our own projects - the work to develop the Arabic and Bangla resources for LexiVault was essential to develop the materials for our SAVANT experiments. |
| URL | https://github.com/SAVANT-team/LexiVault |
| Description | QMUL Summer School 2022 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Schools |
| Results and Impact | Following our visits to local year 12 students, we invited applications to participate in a 2 day on campus Summer School in late June 2022. Three applicants were selected from each of the 3 schools we visited. Attendees' travel costs and lunch were covered by the grant to remove barriers to participation. We ran a two day summer school, in which students were given lectures/seminars in the morning about language processing research (focusing on Spoken Language on Day 1, and Written Language on Day 2) and then spent the afternoons in our Electroencephalography Lab learning how to record brain activity. Students got to try having their own brains recorded, and being experimenters (putting the sensor caps on a participant), and received a certificate of participation. Attendees were also matched with a QMUL student ambassador who was available to help them with their UCAS applications later in the summer/autumn. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://savant.qmul.ac.uk/summer-school-london/ |
| Description | QMUL Summer School 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Undergraduate students |
| Results and Impact | We ran a 3 day intensive training workshop for QMUL Linguistics UG students in between their first and second years to provide them with hands on lab based training in using our eye tracker and our electroencephalography equipment to record participant responses to linguistic stimuli. This training was designed to allow these students to acquire critical skills, understanding, and confidence to enable them to pursue experimental linguistics projects for their dissertations in their third year, which some of them have done, and to work towards careers in the research and/or healthcare space where such expertise and confidence are valuable assets. Five students participated in this workshop. These students were also then trained sufficiently to provide valuable paid support throughout the academic year for Open Days, and Widening Participation events, and for the following year's Summer School. Thus this training directly addressed the key goal of the SAVANT project to support the development of a pipeline both into university, and on into related careers for young people with diverse linguistic and cultural backgrounds. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://savant.qmul.ac.uk/summer-school/summer-school-london-2023/ |
| Description | School Visits 2022 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Schools |
| Results and Impact | We visited 3 local colleges to talk to year 12 students studying English Language A levels. We talked to them about psycholinguistic and neurolinguistic research on understudied languages, and invited them to take part in our Summer School later that year. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://savant.qmul.ac.uk/summer-school-london/ |
| Description | Summer School Ljulbljana |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Schools |
| Results and Impact | Our Ljubljana based team established partnerships with the British International School of Ljubljana, the International School Ljubljana, and the European School of Ljubljana. For a summer session, we selected around 15-20 final year students from the European School of Ljubljana, who possessed a mix of backgrounds and native languages, all with classes taught in English. Additionally, three undergraduate students were extensively trained in the operation of EEG machines. In a one day summer workshop, school students participated in a morning lecture/seminar on Neurolinguistics, and an afternoon hands on workshop learning to use the EEG system. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://savant.qmul.ac.uk/summer-school-ljubljana/ |
| Description | Summer School London - 2024 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Schools |
| Results and Impact | On Wednesday, 12 June 2024, the SAVANT team and the Linguistics Department at Queen Mary University of London hosted an exciting Experimental Linguistics Taster Day for A-level English Language students from Leyton Sixth Form College. This event provided an in-depth look at how linguists study language and its fascinating intersection with psychology and sociology. 13 students attended the Taster Day, which included both short lectures/demonstrations as well as lab based hands on demos and training, allowing students to explore the range of tools we use in the scientific study of human language. This event was supported by three UG students who had participated in the previous year's UG Summer School, and in other outreach activities over the year. These two students were able to run hands on demos/training with our eye tracker, and to answer school student questions about studying linguistics and studying at QMUL. Multiple students from Leyton Sixth Form College subsequently applied to QMUL to Linguistics, Psychology and related programs. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://savant.qmul.ac.uk/summer-school/summer-school-london-2024/ |
| Description | The Impact of Tagalog in Psycholinguistics and Formal Linguistics |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | This symposium that took place on Zoom on March 22nd, 2024 discussed what Tagalog can teach us about psycholinguistics and formal linguistics.It gathered an international panel of experts working on Tagalog across a range of linguistic disciplines to share their work with a global audience, and foster awareness of the importance Tagalog for understanding human language generally. Attendees included several Tagalog language teachers both in the Philippines and around the world who were extremely appreciative the chance to ask experts about child language development, second language acquisition, and the brain bases of linguistic knowledge. Dave Cayado, the SAVANT researcher who was the lead organiser and host of the symposium, subsequently received many inquiries from both academic and non-academic audiences about Tagalog language research. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://savant.qmul.ac.uk/output/symposiums/ |
