Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction

Lead Research Organisation: Cardiff University
Department Name: Sch of English Communication and Philos

Abstract

This project will create a major corpus of Welsh language: CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes: National Corpus of Contemporary Welsh). A corpus is a principled collection of language data sampled from real-life contexts, presented as a searchable database. This will be the first corpus to represent spoken, written and electronically-mediated Welsh, and the first in any language with a functional design informed, from the outset, by representatives of all anticipated academic and community user groups. CorCenCC will provide societal, economic and academic benefits by:
- Facilitating uses of Welsh in public, commercial, educational and governmental settings.
- Redefining the scope, relevance and design infrastructure of corpus development methodology.

A corpus allows users to identify and explore language as it is actually used, rather than relying on intuition or prescriptive accounts of how it 'should' be used. This evidence-based approach is used by academic researchers, lexicographers, teachers, language learners, assessors, resource developers, policy makers, publishers, translators and others, and is essential to the development of technologies such as predictive text production, word processing tools, machine translation, voice recognition and web search tools. Welsh has had no comprehensive corpus facility able to meet these requirements.

CorCenCC will capitalise on extensive community interest in sustaining and 'growing' Welsh, using the novel integration of crowdsourcing, a powerful data collection method which has the potential to revolutionize corpus construction. Recruited through social and broadcast media, roadshows and existing networks, Welsh speakers will record and upload their own data via a mobile app, and even contribute to data coding. This approach promises representative language across genres, language varieties (regional and social) and contexts. Traditional, data collection will supplement the crowdsourcing, ensuring a representative balance of data as specified in the project targets.

Preliminary engagement with stakeholders (including a briefing event at the Senedd) generated collaboration from the Welsh Government, Welsh Language Commissioner, Welsh Joint Education Committee, Welsh for Adults, BBC, Gwasg y Lolfa press, and University of Wales Dictionary; all have identified current needs which CorCenCC can meet, and all will be represented in the project advisory group, so the corpus design is user-informed throughout. A language corpus able to inform delivery of Welsh has been called for by e.g. National Foundation for Educational Research (2008:48) and Welsh Government (2013:27,71). CorCenCC, with its integrated pedagogical toolkit, will impact significantly on Welsh language teaching practice, enabling data-driven, inductive learning and assessment.

CorCenCC will be open-source and publicly accessible, with user interfaces for specific groups. It will enable, for example, community users to investigate dialect variation or idiosyncrasies of their own language use; professional users to profile texts for readability or develop digital language tools; language learners learn from real life models of Welsh; and researchers to investigate patterns of language use and change. In order to ensure that CorCenCC remains a sustainable, permanent and user-oriented record of language, an in-built facility will allow data to be added and moderated beyond the life of the project.

The project team comprises experts in corpus linguistics, Welsh, and language pedagogy and assessment, who specialise in the application of linguistic tools to real world issues. Working with an advisory body of stakeholder representatives, they are optimally placed to meet the project aims: creating a permanent, sustainable and fit-for-purpose record of the living language, and pioneering an approach to content generation and user-driven applications that will provide a model for future corpus creation.

Planned Impact

CorCenCC will be a freely available resource under an open licence which, when combined with the user-driven design and construction, will maximise its potential impact, enabling it to inform the work and activities of current and future users of Welsh in a number of critical areas, including:
- Second language teaching and learning: Reports on the teaching of Welsh for Adults (Mac Giolla Chriost et al., 2012; Welsh Government report, 2013) have drawn attention to the need for a corpus of contemporary Welsh. CorCenCC will meet this need, informing curriculum writing, language assessment and language learning resources as similar corpora do effectively in English (e.g. the Cambridge English Corpus (CEC) which informs Cambridge English Language teaching resources, the British National Corpus (BNC) which informs Pearson Longman's resources). CorCenCC will facilitate data-driven learning, enhancing the effectiveness of teaching Welsh as a second language (compulsory in all schools in Wales up to the end of Key Stage 4).
- The Welsh Government and National Assembly of Wales (Language Policy): CorCenCC will facilitate the realisation of action points in the Welsh Language Commissioner's strategy relating to digital content and applications, translation, terminology, language planning and research. These reflect the priorities of the Welsh Government in its Welsh Language Strategy for 2012-17 'A living language: a language for living'.
- The translation industry in Wales: CorCenCC outputs fit with the mid-term development of Microsoft Translate software: preliminary research (Screen, 2014) shows that example-based machine translation alone can improve the productivity of human translators by up to 55%, and by contributing to an eventual hybrid machine translation system, CorCenCC could further improve translation efficiency.
- The media in Wales: CorCenCC will increase the accessibility of the content of Welsh language media across all platforms and, by ensuring the language is appropriately pitched, will encourage more people to interface with the media in Welsh. CorCenCC offers TV and radio broadcasters the potential to produce language guidelines similar to those developed by Catalan language broadcaster TV3. BBC Cymru Wales is working with CorCenCC to provide data and to ensure that it can inform their work on all media platforms.
- Welsh language publishers and lexicographers: CorCenCC provides the means to target content at audiences of different reading abilities and enhance the language tools available to authors for constructing graded readers. It will enable the commissioning of dictionaries of modern Welsh based on actual language use (see letters of support E and D from University of Wales Dictionary and Gwasg y Lolfa).
- Language technology companies: a core requirement for companies using web-based and online social media data is a large high quality training corpus and CorCenCC will provide this. Data analytics and big data are predicted to account for cumulative benefits of £216 billion to the UK economy between 2012-17. Availability of CorCenCC will help to stimulate related research in Wales and for Welsh textual analytics.
- In the public domain: Via the project engagement strategy (National Eisteddfod interaction, short story competitions, etc.), and facilitated by the crowdsourcing approach, future users will be directly involved in the construction and design of the corpus to ensure it is user-friendly, accessible and appropriate to their needs. This will build on existing interest in Welsh language and heritage, to foster community 'ownership' of the corpus.
 
Description British Council Funding (for the CorCenCC launch)
Amount £2,000 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2017 
End 04/2017
 
Description Cardiff University CUROP (Cardiff University Research Opportunity) internal funding. Project name: entitled 'Corpws Cenedlaethol Cymraeg Cyfoes: National Corpus of Contemporary Welsh - a focus on spoken data'
Amount £2,100 (GBP)
Organisation Cardiff University 
Sector Academic/University
Country United Kingdom
Start 07/2018 
End 08/2018
 
Description Cardiff University CUROP (Cardiff University Research Opportunity) internal funding. Project name: entitled 'Corpws Cenedlaethol Cymraeg Cyfoes: National Corpus of Contemporary Welsh - semantic tagging and data annotation'
Amount £2,100 (GBP)
Organisation Cardiff University 
Sector Academic/University
Country United Kingdom
Start 07/2018 
End 08/2018
 
Description Competitive commission from Welsh Government to provide a rapid evidence assessment of effective second language teaching approaches and methods
Amount £24,992 (GBP)
Funding ID Contract 171802 
Organisation Government of Wales 
Sector Public
Country United Kingdom
Start 10/2017 
End 03/2018
 
Description ESRC DTP Collaborative Studentship - Welsh and Applied Linguistics : ESRC Wales Doctoral Training Partnership PhD Studentship "Strategic bilingualism: identifying optimal context for Welsh as a second language in the curriculum"
Amount £81,253 (GBP)
Funding ID 2096320 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 10/2018 
End 09/2021
 
Description Get Creative with Cymraeg
Amount £20,000 (GBP)
Organisation Government of Wales 
Sector Public
Country United Kingdom
Start 01/2018 
End 04/2018
 
Description RIAH - Research Institute for Arts and Humanities, Swansea University Funding (for the CorCenCC launch)
Amount £1,000 (GBP)
Organisation Swansea University 
Sector Academic/University
Country United Kingdom
Start 02/2017 
End 04/2017
 
Description School Research and Innovation Fund (for the CorCenCC launch)
Amount £1,500 (GBP)
Organisation Cardiff University 
Department School of English, Communication & Philosophy
Sector Academic/University
Country United Kingdom
Start 02/2017 
End 03/2017
 
Description Swansea University: SPIN (Swansea paid internship) placement for data collection, transcription and interviewing of teachers/tutors 2017-18
Amount £1,200 (GBP)
Organisation Swansea University 
Sector Academic/University
Country United Kingdom
Start 03/2018 
End 08/2018
 
Description Welsh Government Technology Funding - funding for the Welsh Stemmer project
Amount £20,000 (GBP)
Organisation Government of Wales 
Sector Public
Country United Kingdom
Start 01/2019 
End 04/2019
 
Description Welsh for Adults - B1 Canolradd core vocabulary research project
Amount £1,968 (GBP)
Funding ID Project 102497 
Organisation Welsh Joint Education Committee 
Sector Academic/University
Country United Kingdom
Start 01/2018 
End 03/2018
 
Description BBC 
Organisation British Broadcasting Corporation (BBC)
Department BBC Cymru Wales
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution BBC Wales have become official partners of the project (a collaborative contract has been signed). BBC Wales will provide extensive amounts of data for us to use on the project and we will involve them in our user-driven consultations regarding the design and construction of CorCenCC.
Collaborator Contribution BBC Wales will provide extensive amounts of data for us to use on the project and we will involve them in our user-driven consultations regarding the design and construction of CorCenCC.
Impact No outputs as yet (the collaboration has only just begun).
Start Year 2017
 
Description S4C 
Organisation S4C
Country United Kingdom 
Sector Private 
PI Contribution S4C have become official partners of the project (a collaborative contract has been signed). S4C will provide extensive amounts of data for us to use on the project and we will involve them in our user-driven consultations regarding the design and construction of CorCenCC.
Collaborator Contribution S4C will provide extensive amounts of data for us to use on the project and we will involve them in our user-driven consultations regarding the design and construction of CorCenCC.
Impact No outputs as yet (the collaboration has only just begun).
Start Year 2016
 
Title CorCenCC crowdsourcing app 
Description As part of the CorCenCC (National Corpus of Contemporary Welsh) project, the CorCenCC Crowdsourcing Application has been designed to allow Welsh speakers to record conversations between themselves and others across a range of contexts and to upload them for inclusion in the final corpus. Crowdsourced corpus data is a relatively new direction that complements more traditional language data collection methods, and is ideally suited to the positive community spirit that exists among speakers and learners of the Welsh language. Using our Crowdsourcing Application, Welsh speakers can engage with the CorCenCC project easily and at their own convenience. Users are able to: *** Create and adjust a user profile based around the context of their Welsh language background, *** Make audio and video recordings of their Welsh language conversations and exchanges, *** Include focused additional information about recordings as metadata, *** Upload recordings for inclusion in CorCenCC - the National Corpus of Contemporary Welsh. In making contributions to the corpus a much more personal experience, the CorCenCC team wants to give users ownership and control of their own language data, and the opportunity to share the most natural and accurate representation possible of their Welsh in the contemporary context with the new National Corpus. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact The app has only just been released - too early to comment. 
URL https://itunes.apple.com/gb/app/ap-torfoli-corcencc/id1199426082
 
Title CyTag - Welsh Part of Speech Tagger 
Description CyTag is an innovative Welsh tagger (complete with bespoke tagset) designed and constructed for the project. It is being used in conjunction with the semantic tagger to tag all lexical items in the corpus. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact CyTag will be demoed at future project roadshows and public events (including Tafwyl and the Eistedfodd). 
URL http://cytag.corcencc.org
 
Title Demo version of the CorCenCC query tools 
Description Demo version of the query tools that will be used for CorCenCC 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact Links to these tools have been circulated and the iterative development of them will continue until the final release of the corpus at the end of the project 
URL https://corpusdemo.corcencc.org/home?language=en
 
Title Welsh Semantic Tagger Version 1 
Description We have created a first version of the software prototype to apply corpus annotation automatically to Welsh language data. This first version incorporates word and coarse grained grammatical analysis but no semantic disambiguation so far. The potential meanings assigned have been derived automatically by converting English dictionaries through bilingual dictionaries and small parallel corpora. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact This first prototype was publicly demonstrated at the project launch in Cardiff in March 2017 to a large audience including members of the Welsh assembly and other external project stakeholders. 
URL http://ucrel.lancs.ac.uk/usas/
 
Description BBC Radio Wales interview (App/project launch - Dawn Knight) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Project PI Dawn Knight was interviewed on Good Morning Wales, discussing the launch of the CorCenCC crowdsourcing app and encouraging listeners to 'Give us your Welsh' (i.e. contribute data to the project). 28/02/17 (2:24:53 in).
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/programmes/b08d6d7q
 
Description Bimonthly project newsletter 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact We issue a bimonthly bilingual newsletter which is circulated to all members of the team, stakeholders, participants and supporters of the project (as well as members of the general public). The newsletter provides project updates and encourages individuals to sign up to contribute data to the corpus.
Year(s) Of Engagement Activity 2016,2017
URL http://www.corcencc.org/news_events/
 
Description Business Wales Advances Magazine coverage 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Business Wales Advances Newsletter
Year(s) Of Engagement Activity 2017
URL https://businesswales.gov.wales/sites/business-wales/files/documents/Advances82_English_FINAL.pdf
 
Description Cardiff University college newsletter (online) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Cardiff University College newsletter project coverage
Year(s) Of Engagement Activity 2017
URL http://sites.cardiff.ac.uk/ahss/introducing%E2%80%AFthe-corcencc-project/
 
Description CorCenCC project newsletter 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact CorCenCC project newsletter. Was produced monthly from April 2016-November 2016, then bi-monthly after this date (latest edition = issue 15, January 2018). The Welsh version of the newsletters can be found here: http://www.corcencc.cymru/y_diweddaraf/#s2
Year(s) Of Engagement Activity 2016,2017,2018
URL http://www.corcencc.org/news_events/#s2
 
Description CorCenCC website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Bilingual public-facing website which will be used to host the corpus when it is constructed. The website contains information on what the project aims to do; how individuals can get involved (and how they can sign up to the newsletter); and provides updates on 'where we are' with the work.
Year(s) Of Engagement Activity 2017
URL http://www.corcencc.org/
 
Description Cwis y Corpws Cenedlaethol 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Online quiz run by the BBC - providing some basic information on what a corpus is, and quizzing readings about patterns in word frequency and usage.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/cymrufyw/46391607
 
Description Formal CorCenCC project launch at the Pierhead Building, Cardiff Bay 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact The CorCenCC launch event was an opportunity for (invited) attendees to learn more about the project, view a demonstration of the new data collection app, and experience the corpus tools in action. In a series of short presentations, the following people shared their impressions of how CorCenCC will impact on research, policy, and on the Welsh language community more widely:

- Bethan Jenkins AM, Chair of the Welsh Language and Communications Committee
- Professor Elizabeth Treasure, Deputy Vice-Chancellor, Cardiff University
- Professor Damian Walford Davies, Head of the School of English, Communication and Philosophy, Cardiff University
- Dr Dawn Knight, Principal Investigator of the CorCenCC project, Cardiff University
- Alun Davies AM, Minister for Lifelong Learning and Welsh Language
- Professor Martin Stringer, Pro-Vice-Chancellor, Swansea University

Attendees include representatives from the BBC, S4C, National Library of Wales, Welsh Language Commissioner's office, National Assembly for Wales, various academic institutions, Welsh for Adults and project partners and collaborators.
Year(s) Of Engagement Activity 2017
 
Description Heno TV appearance/project plug by project ambassador Nia Parry 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Project Ambassador Nia Parry was involved in a TV interview on Heno (S4C) and mentioned the project - briefly discussing the aims and objectives of the work, in an effort to engage members of the public and encourage them to contribute data to the corpus (16/6/16 - at minute 41).
Year(s) Of Engagement Activity 2016
URL http://www.bbc.co.uk/iplayer/episode/p03w7wcm/heno-mon-06-jun-2016
 
Description Interview of Radio Cymru - 4th February 2019 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact RA Laura Arman was interviewed on Aled Hughes' programme on Radio Cymru, discussing the progress on the CorCenCC project to date and outlining to the general public how they may get involved in the future.
Year(s) Of Engagement Activity 2019
URL https://www.bbc.co.uk/programmes/m0002bx7?fbclid=IwAR0Ovac120CiPcgeroAAFzseafsxCRgsKfzlGhFuB1RuJ14RK...
 
Description Invited public talk - Cymdeithas y Llan a'r Bryn, Llangennech, Carmarthenshire 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact I was invited to talk about the project as part of the annual programme of public talks by Cymdeithas y Llan a'r Bryn of Llangennech. Around 50 people attended on the night and many subsequently agreed to give data to the project. There was a lively debate as to what consitutes 'correct' or 'acceptable' Welsh and therefore what should or should not be included in the Corpus. I have been invited to return to talk about the Corpus at a later stage when it has been completed.
Year(s) Of Engagement Activity 2017
 
Description Media engagement/announcement 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact One of the project RAs was featured in a local newspaper (as an alumni of a local school), discussing the aims and objectives of the project.
Year(s) Of Engagement Activity 2018
URL https://www.gllm.ac.uk/news/2147491168/
 
Description National newspaper project mention 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Newspaper coverage of project
Year(s) Of Engagement Activity 2017
URL http://www.dailymail.co.uk/news/article-4304544/Mucking-playschool-goes-right-window.html
 
Description Newyddion 9 on S4C - TV interview by CI Steve Morris 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Project CI Steve Morris was involved in a TV news interview on S4C (Newyddion 9 on S4C) to discuss the aims and objectives of the project, in an effort to engage members of the public and encourage them to contribute data to the corpus.
Year(s) Of Engagement Activity 2016
URL http://www.bbc.co.uk/cymrufyw/34509519
 
Description Press release to mark the start of the CorCenCC project (featured predominantly on academic websites and in research newsletters) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Press release issued to mark the start of the CorCenCC project. This press release was published on the following academic websites and in academic publications (site/publication details; date of publication; link (where appropriate)):

Cardiff University website (English); 02/03/2016; http://www.cardiff.ac.uk/news/view/212132-st-davids-day-kick-off-for-welsh-language-project
Cardiff University website (Welsh); 02/03/2016; http://www.cardiff.ac.uk/cy/news/view/212132-st-davids-day-kick-off-for-welsh-language-project
Swansea University website (English); 01/02/2016; http://www.swansea.ac.uk/riah/research-projects/corcencc/
Swansea University website (Welsh); 01/02/2016; http://www.swansea.ac.uk/cy/riah/prosiectau-ymchwil/corcencc/
Cardiff University Digital Cultures blog (relating to an internal launch); 25/03/2016; https://cardiffdigitalnetwork.org/2016/03/25/corcencc-launch/
Swansea University research website (Welsh); 01/03/2016; http://www.swansea.ac.uk/media/Momentwm%20rhifyn%2021.pdf
Swansea University research website (English); 01/03/2016; http://www.swansea.ac.uk/media/Momentum%20issue%2021.pdf
Year(s) Of Engagement Activity 2016
 
Description Project launch event and app launch press release (16 different sources/places of publication) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact To mark twelve months since the start of the project, and to mark the completion of the crowdsourcing app and the public launch event (28/02/16) we issued a press release which sought to inform members of the public, policy makers, government officials etc., about the progress made on the project and to encourage them to 'give us their Welsh'. The following list documents the websites/newspapers that the media release was published on (and whether it was in Welsh or English); date of publication and a link to the site (or the title of the piece, as relevant):


Swansea University website (Welsh); 14/02/2017; http://www.swansea.ac.uk/cy/canolfan-y-cyfryngau/newyddion-diweddaraf/gallsiaradwyrcymraegymmhobmangyfrannuatadnoddiaithcenedlaetholdrwyddefnyddioapnewydd.php

Swansea University University website (Welsh); 14/02/17; http://www.swansea.ac.uk/media-centre/latest-news/welshspeakerseverywherecancontributetoanationallanguageresourcethoughnewapp.php

Y Cymro Welsh paper; 17/01/17; link not available

Techdragons Wales blog; 2017; http://techdragons.wales/academics-launch-app-to-promote-welsh-language/

Bangor University website (English); 15/02/2017; https://www.bangor.ac.uk/news/latest/we-need-your-welsh-31042

Bangor University website (Welsh); 15/02/2017; https://www.bangor.ac.uk/addysg/newyddion/mae-angen-eich-cymraeg-arnom-31042

Denbighshire Free Press Local Paper (Welsh Language Section); 22/02/17; link not available

BBC Wales news site (English); 28/02/2017; http://www.bbc.co.uk/news/uk-wales-39120536

BBC Wales news site (Welsh) 28/02/2017; http://www.bbc.co.uk/cymrufyw/39109825

Cardiff University homepage (English); 01/03/2017; http://www.cardiff.ac.uk/news/view/616189-national-corpus-of-contemporary-welsh

Cardiff University homepage (Welsh); 01/03/2017; http://www.cardiff.ac.uk/cy/news/view/616189-national-corpus-of-contemporary-welsh

Lancaster University webpage; 01/03/2017; http://www.lancaster.ac.uk/news/articles/2017/national-corpus-of-contemporary-welsh/

My Science Blog; 01/03/2017; https://www.myscience.org.uk/wire/national_corpus_of_contemporary_welsh-2017-cardiff

Daily Mail online; 11/03/2017; http://www.dailymail.co.uk/news/article-4304544/Mucking-playschool-goes-right-window.html
Year(s) Of Engagement Activity 2017
 
Description Project launch press release (10 different sources/places of publication) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Having obtained the funding for the CorCenCC project, we had an initial press release in 2015 which sought to inform members of the public, policy makers, government officials etc., about the aims and objectives of the project from the very start. The following list documents the websites/newspapers that the media release was published on (and whether it was in Welsh or English); date of publication and a link to the site (or the title of the piece, as relevant):

Wales Online; Online news (Welsh); 08/10/2015; http://www.walesonline.co.uk/news/wales-news/welsh-language-10-million-words-10217359

Lleol.Cymru; Online news (Welsh) 08/10/2015; http://www.lleol.cymru/blog/detail.php?blog=corpws-cyntaf-yn-y-gymraeg-yn-cael-ei-sefydlu

Y Cymro; Welsh newspaper; 09/10/2015; Corpws cyntaf o'r iaith Gymraeg

Tab Student paper; 12/10/2015; http://thetab.com/uk/cardiff/2015/10/12/1-8m-granted-cardiff-university-save-welsh-language-11686

ENCAP website; Uni website; 14/10/2015; http://www.cardiff.ac.uk/news/view/147217-1.8m-for-online-resource-of-contemporary-welsh-language

COMSC site; Uni website; 16/10/2015; http://www.cs.cf.ac.uk/newsandevents/corpus.html

Bangor Uni website; 13/10/2015; http://www.bangor.ac.uk/addysg/news/-1-8m-funding-for-large-scale-online-resource-of-contemporary-welsh-language-24635

AcSS Website; 01/10/2015; https://www.acss.org.uk/news/new-large-scale-open-source-corpus-of-contemporary-welsh-language-to-be-created/

Lancaster University website; 02/11/2015; http://www.lancaster.ac.uk/news/articles/2015/18m-for-first-ever-large-scale-online-resource-of-contemporary-welsh-language/

WISERD webpage; 04/11/2015; http://www.wiserd.ac.uk/news/latest-news/corcencc-commence-march-2016/#sthash.CpIdAo3S.dpbs
Year(s) Of Engagement Activity 2015
 
Description Radio Cymru: Post Prynhawn (Welsh) - project description with Steve Morris 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Project CI Steve Morris was involved in a radio interview on BBC Radio Cymru to discuss the aims and objectives of the project, in an effort to engage members of the public and encourage them to contribute data to the corpus (17/2/14 - 56:15 onward)
Year(s) Of Engagement Activity 2016
URL http://www.bbc.co.uk/programmes/b08d6d7q
 
Description Radio Cymru: Post Prynhawn (Welsh) interview (App/project launch - Nia Parry) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Project ambassador Nia Parry was interviewed on Radio Cymru, discussing the launch of the CorCenCC crowdsourcing app and encouraging listeners to 'Give us your Welsh' (i.e. contribute data to the project). 28/02/17 (at around 7:40am).
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/programmes/b08d6d7q
 
Description Social media campaign by Lancaster University promoting Global Lancaster focussed on our multilingual semantic tagging software. 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact This was a social media campaign by Lancaster University promoting Global Lancaster. Paul Rayson, Scott Piao and Mahmoud El-Haj were interviewed and featured in the video talking out the need for Natural Language Processing and AI research. The video focussed on our multilingual semantic tagging software and how the general public, and other groups could engage in the research and benefit from it.
Year(s) Of Engagement Activity 2018
URL https://twitter.com/LancasterUni/status/1022138287035764736
 
Description Talks/activity at the National Eisteddfod of Wales festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact We deliver 1-2 presentations (in Welsh) during the week of the National Eisteddfod of Wales festival on an annual basis. The presentations outline the aims and objectives of the project to the general public and function to recruit participants to contribute data to the corpus, and to disseminate findings from the research. Attendees are also encouraged to sign up to the project newsletter to receive further information about the project as time progresses. 2-4 members of the project team are involved in this event on an annual basis.
Year(s) Of Engagement Activity 2016,2017,2018
URL https://eisteddfod.wales/
 
Description Tawfyl festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact We delivered presentations (in Welsh) during the weekend of the annual Tawfyl festival (Welsh arts and culture festival). The presentations outline the aims and objectives of the project to the general public and function to recruit participants to contribute data to the corpus. Attendees are also encouraged to sign up to the project newsletter to receive further information about the project as time progresses. 2-4 members of the project team are annually involved in this event.
Year(s) Of Engagement Activity 2016,2017,2018
URL http://tafwyl.org
 
Description WordNet funding details (Government press release) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact WordNet project funding details - Government Press Release. Welsh version can be found here: http://gov.wales/newsroom/welshlanguage/2017/projects-which-get-creative-with-cymraeg-announced/?skip=1?=cy
Year(s) Of Engagement Activity 2017
URL http://gov.wales/newsroom/welshlanguage/2017/projects-which-get-creative-with-cymraeg-announced/?lan...
 
Description WordNet project press release (Cardiff University) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact WordNet project funding press release (Cardiff University). Welsh version can be found here: http://www.cardiff.ac.uk/cy/news/view/1013418-wordnet-cymraeg?utm_content=buffere2611&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Year(s) Of Engagement Activity 2017
URL http://www.cardiff.ac.uk/news/view/1013418-wordnet-cymraeg?utm_content=buffere2611&utm_medium=social...
 
Description Workshop and 'Gogglebox' type data gathering session as part of Being Human Festival 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact As part of the 2017 'Being Human' Festival (and the only event held through the medium of Welsh), a workshop was held at Ty'r Gwrhyd (Welsh Language Centre) in Pontardawe where members of the public were invited (i) to learn more about the project and (ii) to contribute their data through watching and reacting to videos (which did not include any spoken language) in a similar way to the Channel 4 Gogglebox programme. The session used one of the project's straplines "Rho dy Gymraeg i ni!" [Give us your Welsh] to attract members of the public to the event and many hours of spoken data were collected.
Year(s) Of Engagement Activity 2017
URL https://beinghumanfestival.org/event/give-us-your-welshrho-dy-gymraeg-i-ni/