Intonational Variation in Arabic

Lead Research Organisation: University of York
Department Name: Language and Linguistic Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description The most significant achievement of the grant is the collection of a large parallel corpus of speech data, elicited for the purposes of intonational analysis, in eight dialects of Arabic. In addition we collected data in one dialect (Moroccan Arabic) with older speakers as well as younger speakers, and with speakers who are also fluent speakers of Tamazight. This additional data allows us to explore potential changes in progress and variation due to language contact in this dialect.

Analysis of the data (using a mix of quantitative and qualitative techniques) confirms that there are clear differences in the 'basic' intonation patterns across Arabic dialects. In some cases we discovered intonation patterns that have not previously been described; for example, in Tunisian Arabic a specific rise-fall intonation contour is associated with a 'question marker' (a vowel added to the end of the word) and it is the combination of these two that turns a statement into a question (Hellmuth in press; Bouchhioua et al in press). Similar detailed findings in individual dialects, as well as an overview of the scope of variation across dialects, will be documented in a forthcoming book length publication.

We used tried and tested techniques to elicit our speech data, but additional work was required to adapt these for use in Arabic, due to the particular features of the Arabic language situation (that is, the fact that dialectal Arabic is, on the whole, unwritten). These methods and the rationale of the corpus design is set out in a recent book chapter (Hellmuth 2014).

The methods used for prosodic analysis of the corpus data have evolved in line with recent advances in the field (D'Imperio, M., Cangemi, F., & Grice, M., 2016). As a result we moved away from an approach based primarily on qualitative analysis (manual prosodic transcription) to a mixed methods approach in which the results of qualitative analysis are compared to the results of quantitative analysis (visualisation of F0 contours and statistical analysis). We reflect on the merits of this approach in the methodology section of our forthcoming book length publication for Oxford University Press, 'Intonation in Spoken Arabic Dialects' (Hellmuth, in preparation).

In addition to archiving of the full corpus (audio data + transcriptions) with the UK Data Service (completed), an interactive online searchable database has been constructed, and will be used to facilitate use of the data by non-academic users (allowing searches for individual dialects or sentence types, for example); updates on the availability of new tranches of the data via this interactive database will be made available on the project website: http://ivar.york.ac.uk/.
Exploitation Route The findings of our research will be useful to learners and teachers of Arabic, who will benefit from the availability of descriptions of the pronunciation differences between different Arabic dialects of Arabic, and from the availability of sample sound recordings to download. To lay a foundation for this use, we produced a position paper explaining why, in particular, a description of the intonation patterns of different dialects may be useful for learners and teachers of Arabic (Hellmuth 2014). The paper takes research-led recommendations for teaching of the pronunciation of English as a starting point and explores what the equivalent recommendations would be for Arabic, taking into account the known differences between the two languages.

We have also produced papers i) to show innovative methodology used to collect interactive data in languages such as Arabic where the written form of the language differs from the spoken form (Gargett et al 2014), and ii) to explore whether or not it is possible to detect traces of a person's mother tongue Arabic dialect when they are speaking English as a foreign language (Almbark et al 2014).

Recordings from the IVAr database have been used in development of a prototype online training module designed to evaluate the extent to which 'lay listeners' (with no prior knowledge of linguistics or of Arabic dialects) can be trained to more reliably identify differences between spoken Arabic dialects. Data from the corpus have been used to investigate whether there is lexical stress in Moroccan Arabic, in collaboration with colleagues at the University of Cologne, and as input to testing of a system for automated accent detection (Y-ACCDIST) in collaboration with colleagues from Lancaster University.
Sectors Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy

URL http://ivar.york.ac.uk/
 
Description University of York ESRC Impact Acceleration Account (York ESRC IAA): Responsive Mode
Amount £1,000 (GBP)
Organisation University of York 
Sector Academic/University
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 01/2018 
End 03/2018
 
Title Implementation of the ProsodyLab forced alignment tool for dialectal Arabic 
Description We adapted open source Python scripts distributed by the McGill prosodylab for the ProsodyLab Aligner forced alignment tool, for use for forced alignment of text transcriptions of the IVAr data to the audio recordings, resulting in time-aligned Praat textgrids at the word (and segment) level. An innovation in our lab was adaptation of the tools to ensure robust alignment of longer sound files (i.e. containing longer narratives and/or conversations). 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact HMM models for each dialect analysed, and Praat textgrids automatically time-aligned at the word (and segment) level to audio recordings. Textgrids time-aligned at the word level will be made available alongside the audio files via the IVAr database. 
 
Title Intonational Variation in Arabic Corpus 
Description The Intonational Variation in Arabic (IVAr) corpus is one of the primary outputs of the IVAr project. It is a parallel corpus of speech data in eight dialects of Arabic (plus one bilingual sub-corpus dataset and one dataset collected with speakers in a different age range). Data collection was completed in September 2015. All of the read speech portions of the data are orthographically transcribed, using forced-alignment (time aligned to the digital audio signal). Transcriptions are also available for at least half of the spontaneous speech portions of the database. All speech data and all available transcriptions have been deposited with UKDS. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Testing and development of the Y-ACCDIST accent detection tool (ongoing). 
URL http://reshare.ukdataservice.ac.uk/852878/
 
Description BAB-MSA 
Organisation University of Jordan
Country Jordan, Hashemite Kingdom of 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently under revision.
Start Year 2013
 
Description BAB-MSA 
Organisation University of Leeds
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently under revision.
Start Year 2013
 
Description Comparison of Moroccan Arabic and Tamazight prosodic phonology 
Organisation University of Cologne
Department Institute for Linguistics - Phonetics
Country Germany, Federal Republic of 
Sector Academic/University 
PI Contribution Joint PhD supervision with Prof Dr Martine Grice, for Anna Bruggeman who is working on comparison of the realisation of word-/phrase-stress in Moroccan Arabic and Tamazight.
Collaborator Contribution Anna Bruggeman is analysing some of the Moroccan Arabic bilingual sub-corpus for comparison to parallel work previously carried out by Anna and colleagues at the Cologne lab on Tamazight prosody.
Impact Analysis of a) the acoustic correlates of putative word-level stress and b) the scaling and alignment of f0 peaks observed on q-words in wh-questions in Moroccan Arabic, produced by speakers who are/are not also bilingual in Tashlhiyt, using data from the Moroccan Arabic bilingual sub-corpus.
Start Year 2016
 
Description DiVE-Arabic 
Organisation University of Birmingham
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution In one of our fieldwork locations we collected an additional corpus of data elicited using a virtual world game environment developed by Andrew Gargett (University of Birmingham), and yields audio data which is time-aligned with a log of the actions (movements/orientations) in the virtual world. Dr Gargett is developing methods for annotation and/or analysis of the actions data.
Collaborator Contribution We will provide prosodic annotation of the audio data, using the annotation protocols for the dialect in question, once these are developed (based on the main IVAr corpus data). Once the two levels of analysis are available we will have a rich resource for examining the role of prosody and intonation in situated dialogue in Arabic (for the first time).
Impact DiVE-Arabic: Gulf Arabic Dialogue in a Virtual Environment. / Gargett, Andrew; AlGethami, Ghazi; Hellmuth, Sam. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 2014. This collaboration is multi-disciplinary: linguistics ~ computer science.
Start Year 2013
 
Description Investigation of the correlates of stress in spoken Arabic dialects 
Organisation University of Manouba
Country Tunisia, Tunisian Republic 
Sector Academic/University 
PI Contribution We have collected parallel data in (so far) 8 dialects of Arabic, to determine the phonetic correlates of word level stress in each dialect, using an elicitation paradigm devised by Dr Bouchhioua. The resultant data will allow directly parallel comparison of the correlates of word stress across Arabic dialects for the first time. We will analyse the data after completion of the annotation of the main IVAr data for each dialect.
Collaborator Contribution The elicitation paradigm was devised by our partner, Dr Nadia Bouchhioua of the Universite de la Manouba, Tunis, Tunisia.
Impact Acquiring the phonetics and phonology of English word stress : Comparing learners from different L1 backgrounds. / Alhussein Almbark, Rana; Bouchhioua, Nadia; Hellmuth, Sam. In: Concordia Working Papers in Applied Linguistics, Vol. 5, 2014, p. 19-35.
Start Year 2013
 
Description Y-ACCDIST accent detection 
Organisation Lancaster University
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Provision of data for testing of the Y-ACCDIST accent detection system for spoken dialects of Arabic.
Collaborator Contribution Provision of accent detection tools for testing of the Y-ACCDIST accent detection system for spoken dialects of Arabic.
Impact Funded by University of York ESRC Impact Acceleration Account: Responsive Mode scheme, in collaboration with an external commercial partner.
Start Year 2017
 
Description Radio broadcast (Word of Mouth) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Participation in Radio 4 'Word of Mouth' programme on 'Intonation: the Music of Speech' focussed on variation in the form and function of intonation across languages.
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/programmes/b08dnrqd