Intonational Variation in Arabic

Lead Research Organisation: University of York
Department Name: Language and Linguistic Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description The most significant achievement of the grant is the collection of a large parallel corpus of speech data, elicited for the purposes of intonational analysis, in eight dialects of Arabic. In addition we collected data in one dialect (Moroccan Arabic) with older speakers as well as younger speakers, and with speakers who are also fluent speakers of Tamazight. This additional data allows us to explore potential changes in progress and variation due to language contact in this dialect.

Analysis of the data (using a mix of quantitative and qualitative techniques) confirms that there are clear differences in the 'basic' intonation patterns across Arabic dialects. In some cases we discovered intonation patterns that have not previously been described; for example, in Tunisian Arabic a specific rise-fall intonation contour is associated with a 'question marker' (a vowel added to the end of the word) and it is the combination of these two that turns a statement into a question (Hellmuth in press; Bouchhioua et al 2019). Similar detailed findings in individual dialects, as well as an overview of the scope of variation across dialects, will be documented in a forthcoming book length publication.

We used tried and tested techniques to elicit our speech data, but additional work was required to adapt these for use in Arabic, due to the particular features of the Arabic language situation (that is, the fact that dialectal Arabic is, on the whole, unwritten). These methods and the rationale of the corpus design is set out in a recent book chapter (Hellmuth 2014).

The methods used for prosodic analysis of the corpus data have evolved in line with recent advances in the field (D'Imperio, M., Cangemi, F., & Grice, M., 2016). As a result we moved away from an approach based primarily on qualitative analysis (manual prosodic transcription) to a mixed methods approach in which the results of qualitative analysis are compared to the results of quantitative analysis (visualisation of F0 contours and statistical analysis). We reflect on the merits of this approach in the methodology section of our forthcoming book length publication for Oxford University Press, 'Intonation in Spoken Arabic Dialects' (Hellmuth, in preparation).

In addition to archiving of the full corpus (audio data + transcriptions) with the UK Data Service (completed), an interactive online searchable database has been constructed, and will be used to facilitate use of the data by non-academic users (allowing searches for individual dialects or sentence types, for example); updates on the availability of new tranches of the data via this interactive database will be made available on the project website: http://ivar.york.ac.uk/.
Exploitation Route The findings of our research will be useful to learners and teachers of Arabic, who will benefit from the availability of descriptions of the pronunciation differences between different Arabic dialects of Arabic, and from the availability of sample sound recordings to download. To lay a foundation for this use, we produced a position paper explaining why, in particular, a description of the intonation patterns of different dialects may be useful for learners and teachers of Arabic (Hellmuth 2014). The paper takes research-led recommendations for teaching of the pronunciation of English as a starting point and explores what the equivalent recommendations would be for Arabic, taking into account the known differences between the two languages.

We have also produced papers i) to show innovative methodology used to collect interactive data in languages such as Arabic where the written form of the language differs from the spoken form (Gargett et al 2014), and ii) to explore whether or not it is possible to detect traces of a person's mother tongue Arabic dialect when they are speaking English as a foreign language (Almbark et al 2014).

Recordings from the IVAr database have been used in development of a prototype online training module designed to evaluate the extent to which 'lay listeners' (with no prior knowledge of linguistics or of Arabic dialects) can be trained to more reliably identify differences between spoken Arabic dialects. Data from the corpus have been used to investigate whether there is lexical stress in Moroccan Arabic, in collaboration with colleagues at the University of Cologne, and as input to testing of a system for automated accent detection (Y-ACCDIST) in collaboration with colleagues from Lancaster University.

The corpus data and/or methods have been exploited directly and in depth in two completed PhD projects at the University of York, with four more in progress.
Sectors Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy

URL http://ivar.york.ac.uk/
 
Description The IVAr corpus has been used for testing and development of the Y-ACCDIST accent detection tool (Brown & Hellmuth, forthcoming). Y-ACCDIST is a computational tool which can be "used to inspect sociophonetic corpora as a preliminary "screening" tool" (Brown & Wormald 2017, JASA, p.422).
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice
Impact Types Societal

 
Description University of York ESRC Impact Acceleration Account (York ESRC IAA): Responsive Mode
Amount £1,000 (GBP)
Funding ID ESRC IAA Apr 2014-Mar 2019 ES/M500574/1 
Organisation University of York 
Sector Academic/University
Country United Kingdom
Start 01/2018 
End 03/2018
 
Description University of York ESRC Impact Acceleration Account (York ESRC IAA): Standard Grant
Amount £22,050 (GBP)
Funding ID ESRC IAA Apr 2019-Mar 2023 ES/T502066/1 
Organisation University of York 
Sector Academic/University
Country United Kingdom
Start 09/2021 
End 08/2022
 
Title Implementation of the ProsodyLab forced alignment tool for dialectal Arabic 
Description We adapted open source Python scripts distributed by the McGill prosodylab for the ProsodyLab Aligner forced alignment tool, for use for forced alignment of text transcriptions of the IVAr data to the audio recordings, resulting in time-aligned Praat textgrids at the word (and segment) level. An innovation in our lab was adaptation of the tools to ensure robust alignment of longer sound files (i.e. containing longer narratives and/or conversations). 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact HMM models for each dialect analysed, and Praat textgrids automatically time-aligned at the word (and segment) level to audio recordings. Textgrids time-aligned at the word level will be made available alongside the audio files via the IVAr database. 
 
Title Intonational Variation in Arabic Corpus 
Description The Intonational Variation in Arabic (IVAr) corpus is one of the primary outputs of the IVAr project. It is a parallel corpus of speech data in eight dialects of Arabic (plus one bilingual sub-corpus dataset and one dataset collected with speakers in a different age range). Data collection was completed in September 2015. All of the read speech portions of the data are orthographically transcribed, using forced-alignment (time aligned to the digital audio signal). Transcriptions are also available for at least half of the spontaneous speech portions of the database. All speech data and all available transcriptions have been deposited with UKDS. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Testing and development of the Y-ACCDIST accent detection tool (ongoing). 
URL http://reshare.ukdataservice.ac.uk/852878/
 
Description BAB-MSA 
Organisation University of Jordan
Country Jordan 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently awaiting further revision.
Start Year 2013
 
Description BAB-MSA 
Organisation University of Leeds
Country United Kingdom 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently awaiting further revision.
Start Year 2013
 
Description DiVE-Arabic 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution In one of our fieldwork locations we collected an additional corpus of data elicited using a virtual world game environment developed by Andrew Gargett (University of Birmingham), and yields audio data which is time-aligned with a log of the actions (movements/orientations) in the virtual world. Dr Gargett is developing methods for annotation and/or analysis of the actions data.
Collaborator Contribution We will provide prosodic annotation of the audio data, using the annotation protocols for the dialect in question, once these are developed (based on the main IVAr corpus data). Once the two levels of analysis are available we will have a rich resource for examining the role of prosody and intonation in situated dialogue in Arabic (for the first time).
Impact DiVE-Arabic: Gulf Arabic Dialogue in a Virtual Environment. / Gargett, Andrew; AlGethami, Ghazi; Hellmuth, Sam. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 2014. This collaboration is multi-disciplinary: linguistics ~ computer science.
Start Year 2013
 
Description Investigation of the correlates of stress in spoken Arabic dialects 
Organisation University of Manouba
Country Tunisia 
Sector Academic/University 
PI Contribution We have collected parallel data in (so far) 8 dialects of Arabic, to determine the phonetic correlates of word level stress in each dialect, using an elicitation paradigm devised by Dr Bouchhioua. The resultant data will allow directly parallel comparison of the correlates of word stress across Arabic dialects for the first time. We will analyse the data after completion of the annotation of the main IVAr data for each dialect.
Collaborator Contribution The elicitation paradigm was devised by our partner, Dr Nadia Bouchhioua of the Universite de la Manouba, Tunis, Tunisia. A journal article is currently under review.
Impact Acquiring the phonetics and phonology of English word stress : Comparing learners from different L1 backgrounds. / Alhussein Almbark, Rana; Bouchhioua, Nadia; Hellmuth, Sam. In: Concordia Working Papers in Applied Linguistics, Vol. 5, 2014, p. 19-35.
Start Year 2013
 
Description Language support for Arabic-speaking refugees 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution "Arabic at Home" briefings for the families at the Refugee Council drop-in, Selby and for staff and volunteers of the Refugee Council, Leeds.
Collaborator Contribution We delivered briefings on home language maintenance in Arabic for staff, volunteers and clients of the Refugee Council in North Yorkshire. The materials were prepared, and briefings delivered, by Sam Hellmuth/Rana Almbark (University of York) and Ghada Khattab (Newcastle University).
Impact A UKRI grant application for work to improve pronunciation training for Syrian Arabic speaking learners of English and German is currently under review.
Start Year 2018
 
Description Moroccan Arabic bilingual sub-corpus 
Organisation University of Hassan II Casablanca
Country Morocco 
Sector Academic/University 
PI Contribution We collected a 'cluster' sub-corpus of the IVAr dataset in Casablanca, with data from bilingual speakers of Arabic and Tamazight, in two age groups.
Collaborator Contribution Our local partners assisted with data collection and transcription in Morocco, and also travelled to UK to assist with initial data analysis.
Impact Results of a pilot study on a portion of the data were presented at the 18th ICPhS conference: Hellmuth, S., Alhussein Almbark, R., Chlaihani, B., Louriz, N. (2015). F0 peak alignment in Moroccan Arabic polar questions. Proceedings of the 18th ICPhS, Glasgow. Data analysis of the full bilingual dataset (young speakers) is in progress and a journal article is currently under review.
Start Year 2015
 
Description Y-ACCDIST accent detection 
Organisation Lancaster University
Department Department of Linguistics and English Language
Country United Kingdom 
Sector Academic/University 
PI Contribution Provision of data for testing of the Y-ACCDIST accent detection system for spoken dialects of Arabic.
Collaborator Contribution Provision of accent detection tools for testing of the Y-ACCDIST accent detection system for spoken dialects of Arabic.
Impact Initial scoping work funded by a small Responsive Mode award from the University of York ESRC Impact Acceleration Account allowed for Proof of Concept work in collaboration with an external commercial partner. A follow up collaborative project with a different external partner is ongoing in 2022, funded by a Main Scheme award from the University of York ESRC Impact Acceleration Account.
Start Year 2017
 
Description Arabic at Home briefing: Refugee Council drop-in, Selby. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Third sector organisations
Results and Impact We delivered bilingual briefings on home language maintenance for Arabic to staff, volunteers and clients of the Refugee Council in North Yorkshire. The materials were prepared, and briefings delivered, by Sam Hellmuth/Rana Almbark (University of York) and Ghada Khattab (Newcastle University).
Year(s) Of Engagement Activity 2018
URL https://ivar.york.ac.uk/outreach
 
Description Radio broadcast (Word of Mouth) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Participation in Radio 4 'Word of Mouth' programme on 'Intonation: the Music of Speech' focussed on variation in the form and function of intonation across languages.
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/programmes/b08dnrqd
 
Description The role of language and language choices in participatory and collaborative work 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Third sector organisations
Results and Impact An invited contribution to the York Migration Network Ideas Salon #2 on the role of language and language choices in participatory and collaborative work on migration. The talk included a linguist's response to a visit to the York Art Gallery 'The Sea is the Limit' exhibition and outlined planned work to support home language maintenance for refugee families settled in Yorkshire (and beyond).
Year(s) Of Engagement Activity 2018
URL https://www.york.ac.uk/social-science/research/migration-network/events/2018/mignet-ideas-salon-2/