Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Literature Languages & Culture

Abstract

This project will fuse deep, qualitative analysis with cutting-edge computational methods to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. We will provide the most detailed account to date of similarities and differences in the countries' narrative traditions and, by extension, a novel understanding of their joint cultural history. Using recent advances in language technology, we will digitise, convert and make available a vast collection of folklore manuscripts in Irish and Scottish Gaelic. In turn, this new digital resource will catalyse ongoing research into Gaelic speech technology.

The main question that this project poses is: What can a large, digital collection of traditional narrative tell us about historical cultural exchanges between Scotland and Ireland? While we will examine different kinds of traditional narrative, we will concentrate upon International Tales in Irish and Scottish Gaelic. International Tales are folk narratives that are widely scattered across the world, such as Cinderella or Snow White. Some are amongst the oldest known facets of human oral culture, dating back 5000 years or more. Due to their age and their tendency to vary geographically, they can tell us a lot about relationships between different settlements of people. In this study, we use them as a vehicle for understanding early communication networks between Scotland and Ireland, as well as those that existed within each country.

Through an approach known as text-mining, we will use artificial intelligence to search the tales for similar topics, phrases and other linguistic patterns. The matches that we find between texts will be correlated with what we know about the texts themselves. Using another approach, known as phylogenetic network analysis, we will comb relationships between the texts' themes and the people who produced them: for example, their genders, where they lived and what they did for work. In the end, we will combine the two approaches towards a unified account of Scottish and Irish oral narrative. This work will transform our understanding about Gaelic oral culture and disseminate unique archival material online to a diverse set of end-users. Finally, it will positively impact the sustainability of Gaelic-speaking communities through the creation and further stimulation of important language technologies (e.g. handwriting recognition, machine translation and automatic speech recognition).

Publications

10 25 50
 
Description An Gocair: Inneal Àbhaisteachadh Teacsa airson na Gàidhlig ('The Unhooker: An orthographic normalisation system for Scottish Gaelic')
Amount £18,725 (GBP)
Organisation Bord na Gaidhlig 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2021 
End 12/2022
 
Description Crowdsourcing the Acquisition of Gaelic Speech Technology Training Data
Amount £20,000 (GBP)
Organisation Government of Scotland 
Sector Public
Country United Kingdom
Start 01/2022 
End 12/2022
 
Title Irish handwriting recogniser 
Description Using the platform, Transkribus, we have induced the first Irish handwriting recogniser. The procedures are similar to those that we used in a pilot project (2019-2020) to create the first Gaelic handwriting recogniser. An initial model was seeded using a relatively small number of manually transcribed mss. The model was then used to provide a first-pass of a larger number of documents, the text of which was corrected and then subsequently used to generate a more accurate model. The Irish handwriting recogniser is being used to convert ms handwriting to digital text, speeding up the transcription process by an order of roughly 4 currently. As the recogniser improves, it will save even more time. Like the Scottish Gaelic handwriting model, the Irish model will be made available to the public once it reaches a high level of accuracy. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? No  
Impact The impact of the tool is presently project internal. Eventually, it will be used to help transcribe a range of Irish texts outside of the current project. 
URL https://readcoop.eu/transkribus/
 
Title An Gocair / Gaelic orthographic standardiser 
Description An Gocair is a transformer-based orthographic standardiser for Scottish Gaelic. Its interface is similar to Google Translate. Users can enter orthographically older or irregularly spelled Gaelic text into the left box, and have it transformed to a Gaelic Orthographic Conventions in the right-hand box. Data deriving from the DHH project was used, in part, to train the transformer models. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact The software has only been recently released, but we expect that further funding will follow on the back of this development. 
URL https://angocair.garg.ed.ac.uk
 
Description Social media 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact We have a regular programme of blogging and tweeting about the project. Blogs are being sent out by our Irish partners on the site, https://www.gaois.ie/en/blog/decoding-hidden-heritages/. In Edinburgh, we are blogging about the project at https://blogs.ed.ac.uk/garg/. We have also set up a Twitter account for the project, using the handle @DualchasCeilte, which is Scottish Gaelic for 'Hidden Heritage'. In February 2022, the Edinburgh team began a weekly programme of blogs and Tweets, put out by the three RAs current at the time. On a rotating basis, each RA discusses aspects of their job or the material that they are working with, to document the project's progress and inform the public about it. This activity will also help the case for additional funding in Gaelic and Irish cultural heritage and language technology.
Year(s) Of Engagement Activity 2022,2023
URL https://blogs.ed.ac.uk/garg/