Collocaid: combining learner needs, lexicographic data and text editors to help learners write more idiomatically

Lead Research Organisation: University of Surrey
Department Name: English

Abstract

Over the past decades, the UK has produced a series of world-leading corpus-based pedagogical dictionaries that provide users not just with the definitions of words, but also with a wealth of information on how words are actually used in context. There have also been considerable advances with regard to dictionary format. Nowadays, all major English language dictionaries have digital interfaces. Yet research on dictionary use shows that the spectacular developments in terms of dictionary content and format that have taken place over the past decades have not had a dramatic influence on actual dictionary-user behaviour. Dictionaries - both paper-based and digital - remain by and large underused, and it is widely acknowledged that more needs to be done with regard to teaching people how to use dictionaries to their full potential. This proposal stems from the realization that an arguably better solution would be to develop alternative, dictionary-like tools that do not require much in the way of training or instructions.

This project aims to research how information to help writers produce more accurate and idiomatic texts can be migrated from dictionaries and corpora to digital writing environments in an optimum, minimally intrusive way, without disrupting writing processes. Rather than attempting to cover every possible aspect of writing, we will focus on supporting non-native speakers of English with information to help them deal with collocation. Violating collocation conventions can result in errors (e.g. *They trust in us) or awkward, non-idiomatic text (e.g. *a large difference). Additionally, writers who are unable to retrieve idiomatic collocates (e.g. a narrow/daring/lucky escape) often make do with bland, less interesting alternatives (e.g. a fantastic escape). Although there are dictionaries that focus precisely on collocation, writers are often unaware of them or simply cannot be bothered to use them. Moreover, the simple fact that learners have to stop writing to look up a collocate can disrupt the flow of their words. It is in this context that we propose to research how writers can retrieve information on collocation directly from within digital writing environments in an intuitive and minimally intrusive way so that (1) writers do not need to be trained to look up this information and (2) the flow of writing is not disrupted in the process.

The research will begin with a needs analysis to identify which collocation difficulties to focus on. We will then carry out lexicographic work to address those needs, using, among other resources, computerized language corpora and state-of-the-art lexicographic tools. Next, we will research how to integrate information on collocation with text editors in an easy, helpful and minimally disruptive way. Different models of human-computer interaction and data visualization will be developed and the team will carry out usability studies and test them with a sample of the target population.

The investigators responsible for this project are three well-known academics with many years of teaching and research experience in the fields of second language writing, lexicography, corpus linguistics and human-computer interaction. The team's advisory board counts with Michael Rundell (editor-in-chief of Macmillan Dictionaries), Pete Whitelock (principal language engineer at Oxford University Press dictionary division) and Milos Jakubicek (CEO of Lexical Computing Ltd).

This research will contribute to further the UK's reputation of world-leading developments in the field of pedagogical lexicography. The project has tangible impacts on society, culture and the economy, as its outputs include data and software that can help writers using English as a medium of communication. We will be exploiting the potential of digital technologies to enhance the creation of knowledge through writing, enabling people of different backgrounds to better express themselves in written English.

Planned Impact

In addition to the academic beneficiaries, the present project will generate tangible outputs with a potential to impact society, culture and the economy. There are a number of non-academic stakeholders at a national and international level who can benefit from this. At first instance, these include but are not limited to the following:

a. Writers using English as a medium of communication, especially non-native writers of English (e.g. undergraduate and postgraduate students as well as researchers and lecturers in the UK and abroad, in addition to wider audiences including politicians, journalists and other professionals who need to communicate in written English), will benefit from the development of a user-friendly digital writing environment that can help them produce more grammatical and idiomatic texts.

b. Native English speakers wishing to develop further writing skills (this could include children, students and professionals less fluent in writing) could benefit in similar ways as the beneficiaries in (a).

c. English as a Foreign Language (EFL) and English for Academic Purposes (EAP) tutors in the UK and abroad will have new resources to draw on. They will be welcome to use the information collected on collocation difficulties and collocation solutions in their day-to-day teaching practice. While the primary data generated by the project will be made easily accessible to them through the project website, this group can also benefit from the edited tools and resources developed by group (d) below.

d. The collocation data generated by this project can be commercially valuable to academic publishers producing EAP materials such as Oxford University Press, Cambridge University Press and Pearson ELT, and English language testing services like Cambridge Language Assessment, IELTS and TOEFL. This data can be used to develop books, interactive online exercises and tests. The edited materials and resources they produce using our data will further benefit groups (a) and (b) above and (c) above.

e. Software developers will benefit by having novel visualization methods that focus on personal data. Personal visualization is a fast-growing area, and as of yet there are few techniques to interactively display personal textual data dynamically and interactively.

f. The linguistic tools and resources created for English in this project can have an indirect impact on other languages, fostering the development of similar projects for languages other than English.

In short, the outputs of the present proposal can have a strong societal, economic and cultural impact, with benefits not only to special professional and practitioner groups but also the wider public. By using technology to foster improved writing and by enabling people of different cultural and language backgrounds to better express themselves in written language, we hope to enhance the creation of knowledge and promote greater understanding and communication among different communities.
 
Title Introducing ColloCaid 
Description A video introducing ColloCaid 
Type Of Art Film/Video/Animation 
Year Produced 2019 
Impact Over 250 views 
URL https://vimeo.com/361811792
 
Title Video/animation on Visualisation and graphical techniques to help writers write more idiomatically 
Description The animation explains how visualisation can help authors, it provides a visual animation that explains and provides an overview of the project. 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact Has a wide reach; it is located within the IEEE VTGC community. 
URL https://vimeo.com/230838396
 
Description We found that less experienced users of academic English have a more limited collocation repertoire or awareness of how to put words together idiomatically in written academic English, whether or not their first language is English. This reinforces the idea that there are no native speakers of academic English. A tool like ColloCaid can therefore be useful to all novice users of academic English, from secondary school and undergraduate students at English-medium schools and universities, to experienced researchers using English as an additional language (Research outputs: Frankenberg-Garcia 2018; Tavares Pinto, Rees and Frankenberg-Garcia 2021).

We researched academic words that are widely and frequently used across academic disciplines to ensure our work would benefit the largest possible number of users. We identified 572 core general academic English nodes (311 nouns, 184 verbs and 77 adjectives), i.e., words that are collocationally productive because they frequently combine with other words. (Research outputs: Frankenberg-Garcia, Lew, Roberts, Rees and Sharma 2019; Frankenberg-Garcia, Rees and Lew 2021; ColloCaid lexical dataset)

We used lexical computing software to research how expert writers typically combine core academic nodes with other words, and compiled a lexical database of 32,645 academic English collocations (idiomatic word combinations). Our collocations database is over 12 times greater than the Academic Collocation List published in 2013, with 2469 collocations. (Research output: Frankenberg-Garcia, Rees and Lew 2021; ColloCaid lexical data)

To help academic writers further, we added to the ColloCaid database 29,028 authentic examples of collocation use curated from corpora of expert academic writing. These examples can help writers notice how collocations are used in texts. We also linked to our database hundreds of thousands of further examples from an external source (SkELL).(Research outputs: Frankenberg-Garcia, Rees and Lew 2021, ColloCaid lexical data)

Our research underlined the importance of advanced lexicography skills and human curation when using e-lexicography tools. We identified and documented a number of non-trivial issues for future corpus-based lexicographic work to acknowledge. (Research outputs: Frankenberg-Garcia, Rees and Lew 2021; Frankenberg-Garcia 2021).

We have released to the public through a CC BY license on Figshare a sample of the Collocaid Lexical Database, and have since shared the full database with a partner European project. (Research output: Frankenberg-Garcia, Rees and Lew 2020)

We have studied existing writing assistants and developed a prototype for a text editor from which collocations suggestions can be visualized directly so as not to interrupt writing and word flow. Collocation suggestions are presented in such a way that writers can be prompted to expand their academic writing vocabulary even when they are not aware of their vocabulary limitations. The suggestions are integrated in an intuitive and unobtrusive way, so that writers can choose to use them only as and when needed. (Research outputs: Frankenberg-Garcia, Lew, Roberts, Rees and Sharma 2019; Frankenberg-Garcia, Lew, Roberts, Rees, Sharma and Butcher 2019; Frankenberg-Garcia 2020; Roberts 2020; Zomer and Frankenberg-Garcia 2021; ColloCaid prototype)

Early versions of the prototype we developed were found to be between good and excellent when rated against the widely used System Usability Scale by various groups of target users, including students, experienced researchers and English teachers. Since then, our lexical coverage has been expanded, bugs were addressed, and the tool interface was enhanced. The latest version of the ColloCaid prototype was found to be involve less mental, physical, and time demand and be less frustrating than other tools and resources writers use to look up academic English collocations. (Research outputs: Frankenberg-Garcia, Rees, Lew, Roberts, Sharma, and Butcher 2019; Rees 2021)

The source code behind ColloCaid was released to the public through a CC BY license (Research output: Sharma, Butcher and Roberts 2021).

We compiled and published on our website and through a CC BY license a database of 370 collocation errors and other problems found frequently and widely in general academic English. (Research outputs: collocaid.uk; Frankenberg-Garcia and Rees 2021)

We have delivered numerous engagement activities to promote our research to direct beneficiaries and academic researchers. On 25 February 2022, the ColloCaid prototype had over 6000 registered users from over 100 countries across the globe.
Exploitation Route Academic beneficiaries
Our work has contributed to the advancement of knowledge in corpus linguistics, academic English vocabulary, collocation, writing and second language writing, e-lexicography, visualisation of lexical data, and human-computer interaction.

Wider impact
Our work can be used to teach and improve academic writing in English. Researchers who are not used to writing up their research in English and students who are not used to academic English can benefit from the writing assistant and resources we have created and shared. This can contribute to making academic English writing and scholarly publication for the transfer of knowledge more inclusive.
Sectors Digital/Communication/Information Technologies (including Software)

Education

Other

URL http://www.collocaid.uk
 
Description ColloCaid Website The ColloCaid website at http://www.collocaid.uk/ has received nearly 94K page views up to 9 March 2024. ColloCaid Prototype The proof of concept prototype developed in this project was first released to the public in October 2019. Our latest figures (7 March 2024) indicate that the tool has attracted over 9000 registered users from over 100 countries. The ten countries with the greatest number of registered users are: China, United Kingdom, Brazil, Spain, United States, Germany, Australia, Italy, Saudi Arabia and Belgium. The majority of users are PhD and Master students, but there is also a sizeable number of undergraduate students and a growing number of university lecturers and professors and English teachers using ColloCaid. The tool has been used mainly for writing and revising (88%), but had been increasingly used for teaching English too (8%). We can see from the data that there are a number of universities in different countries have adopted the tool for teaching. This is bound to have a trickle-down effect, generating further impact. ColloCaid Lexical Database The ColloCaid Sample Lexical Database first released in October 2020 on a CC BY 4.0 licence received 1433 views and 178 downloads up to 11 March 2024. The full lexical database has been shared with the European Lexicographic Infrastructure (ELEXIS) project, funded by the Horizon 2020 research and innovation programme. ColloCaid Collocation Errors database The collocation error dataset was released in February 2021. The full dataset available on a CC BY 4.0 licence has received 2327 views and been downloaded 491 times up to 11 March 2024. ColloCaid Code The code that integrates the lexical database to the editor was released in February 2021 on a CC BY 4.0 licence. The code has received 626 views and been downloaded 144 times up to 11 March 2024. Research papers The research papers we have published are beginning to be cited in other publications. A quick search for "collocaid" in Google scholar shows our work has been cited in 139 publications up to March 2024.
First Year Of Impact 2020
Sector Education
Impact Types Cultural

Societal

 
Description 2. Internacionalização do Ensino Superior: políticas linguísticas [Internationalization of Higher Education: language policies], British Council and Brazilian Ministry of Education, Brazil, 25 February 2021.
Geographic Reach South America 
Policy Influence Type Contribution to a national consultation/review
URL https://www.britishcouncil.org.br/events/workshop-internacionalizacao-2
 
Description A Teachers' Guide to Classroom Corpus Use
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or improved professional practice
URL https://uq.pressbooks.pub/using-language-data/
 
Description I Semana de Internacionalização Intercampi-UEMASUL, Maranhão, Brazil
Geographic Reach South America 
Policy Influence Type Influenced training of practitioners or researchers
URL https://www.uemasul.edu.br/portal/i-jornada-de-internacionalizacao-e-ii-seminario-de-formacao-medica...
 
Description Practical influence on the development of a tool to help Brazilian researchers publish in English
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
 
Description Technology-enhanced research writing - Brazil
Geographic Reach South America 
Policy Influence Type Influenced training of practitioners or researchers
Impact 63 researchers working in a range of disciplines at Brazilian universities and 52 English teachers attended workshops in corpus linguistics tools and technologies to assist research writing and teaching academic English in order to support the internationalization of Brazilian research. This included the use of the ColloCaid prototype developed in this project. We have collected delayed feedback (one year after the workshops) from the workshop participants (English teachers and Researchers) All the EAP tutors said yes when asked whether the workshop had influenced they way they taught, with 92.9% stating that they used some or all of the tools and resources seen in the workshop in their own teaching, and 57.1% stating that stating that they had used some of the ideas presented in the workshop. In open ended responses, one teacher commented: "The tools are now part of my bag of tricks for teaching, which means that every now and then I go and use those tools whenever I need real-life examples of English to find patterns I could teach my students, or even answer their questions regarding grammar. It's easy, fast and I can draw many conclusions just from observing the patterns. I also encourage my students to use those tools for independent learning". with regard to the researchers, when asked about their experience writing up papers in English since the workshop, 36.7% had managed to publish at least one paper in an international peer-reviewed journal, 30% had submitted a paper, and 33.3% were working on a paper. There were also 13.3% who declared they had not worked on a paper because they were concentrating on their dissertation or thesis. When those who had worked on a publication were asked about the extent to which the workshop had helped them with their writing on a 5-point scale ranging from what I learned in the workshop helped me a lot to what I learned in the workshop did not help me at all, the median obtained was 4 (range 5-2).
URL https://www.britishcouncil.org.br/sites/default/files/uk_collaboration_call_-_sarmentopintogarcia.pd...
 
Description Technology-enhanced research writing - Spain
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
Impact Postgraduate research students and academics received training in corpus linguistics technologies to support research writing for international, peer-reviewed journals
 
Description British Council UK Brazil Collaboration Call
Amount £10,000 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2019 
End 08/2019
 
Description Santander Staff Mobility Award
Amount £2,000 (GBP)
Organisation Santander Universities 
Sector Private
Country United Kingdom
Start 04/2018 
End 05/2018
 
Title ColloCaid Collocation Error Database 
Description We have recently shared a lexical database of 370 common collocation errors and other problems which can be useful to teachers and materials developers of English academic writing. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact Despite being released less than a month ago, the webpage where a sample of the database is shown has received 1666 page views. The full database has had 193 page views and 22 downloads. 
URL http://www.collocaid.uk/collocation-errors/
 
Title ColloCaid Editor 
Description The ColloCaid editor - the open access prototype of a text editing tool to assist users of academic English to find the words they need to express themselves fluently in writing and improve the readability of their papers. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact We have had 3890 registered users of the prototype under development tool since it was made open to the public. Registered users include undergraduate and postgraduate students, research fellows, university lecturers/professors and English Teachers from more than 60 countries. 
URL https://collocaid.uk/prototype/editor/public/
 
Title ColloCaid Lexical Data 
Description We have developed what is to our knowledge the largest lexical database of academic English Collocations 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact We have released a sample of our lexical database at https://figshare.com/articles/dataset/ColloCaid_Sample_Data/13028207 We have since received a request to share our full database with the European Lexicographic Infrastructure (ELEXIS) project. 
URL https://figshare.com/articles/dataset/ColloCaid_Sample_Data/13028207
 
Title ColloCaid Academic Collocation Errors and Other Problems 
Description The Academic Collocation Errors and Other Problems database by ColloCaid (www.collocaid.uk) comprises 361 collocation errors and other issues found widely and frequently in English academic writing.Problems are classed as:(1) errors like *knowledge in how instead of knowledge of how; (2) atypical word combinations that are grammatically acceptable but not very common, like lose control over (atypical) compared with lose control of (typical); (3) strings that are wrong in most contexts, like knowledge in the (unacceptable most contexts, such as *knowledge in the issue), yet can be acceptable in a few contexts, like knowledge in the field of linguistics.A variety of sources were used to compile the database, including learner corpora, textbooks, dictionaries, and grammars. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Approximately 60 views and 10 download 
URL https://figshare.com/articles/dataset/ColloCaid_Academic_Collocation_Errors_and_Other_Problems/13640...
 
Title ColloCaid Sample Data 
Description The ColloCaid Sample Data comprises approximately 2% of the ColloCaid lexical database. The sample covers 692 strong academic English collocations (LogDice >5.0) for 16 core academic lemmas used as collocation bases (or nodes): 5 nouns, 5 verbs, and 6 adjectives. The selection aims to give an overview of the range of data included in the full dataset. This includes collocations with bases classified with more than one part-of-speech tag (e.g. DEBATE, INDIVIDUAL), polysemous collocation bases giving rise to distinct collocation patterns (e.g. CODE), as well as collocation bases that evoke a very large and a very small number of collocations. The strongest eight lexical collocations listed for each base are enriched with three different curated example sentences adapted from corpora of expert academic English writing. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact 388 views and 36 downloads 
URL https://figshare.com/articles/dataset/ColloCaid_Sample_Data/13028207
 
Title ColloCaid lexical database 
Description We have compiled a corpus-based lexical database to support the ColloCaid text editor. The lexicographic database underlying ColloCaid includes at this time: 557 lemmas in 702 senses; 31,927 non-discipline-specific collocations extracted from corpora of expert academic writing; 30,203 curated corpus examples of core collocations in context 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Our database serves the ColloCaid editor to help academic writers find collocation suggestions to improve the idiomaticity of their texts. We have at this time 257 registered users of the ColloCaid prototype. 
URL http://www.collocaid.uk
 
Description Collaboration with ELEXIS using ColloCaid data Word of Games 
Organisation Jožef Stefan International Postgraduate School
Country Slovenia 
Sector Academic/University 
PI Contribution We contribute headwords and their collocations from ColloCaid for use in the World of Games app.
Collaborator Contribution Our partners offer expertise in game development and the opportunity to crowdsource additional data for ColloCaid.
Impact Use of ColloCaid data in World of Games app
Start Year 2021
 
Description Leon Workshop 
Organisation University of Leon
Country Spain 
Sector Academic/University 
PI Contribution We delivered a one-week technology-enhanced writing workshop to support Spanish academics publishing in English at the University of Leon, Spain, in June 2019. A beta version of the ColloCaid prototype was trialled during the workshop.
Collaborator Contribution University of Leon partner was responsible for local organization and helped to deliver the workshop.
Impact Workshop participants received research writing support.
Start Year 2019
 
Description Teaching at the European Masters of Lexicography 
Organisation University of Santiago de Compostela
Country Spain 
Sector Academic/University 
PI Contribution Ana Frankenberg-Garcia delivered a guest talk and two workshops on corpus-based lexicography for the European Masters of Lexicography programme hosted at Universidad de Santiago de Compostela, Spain, between 3 and 5 November 2021.
Collaborator Contribution This was more of an outreach one-sided collaboration, but it was useful to get feedback from the masters students on their needs/difficulties/questions.
Impact As a direct result of this collaboration, the University of Surrey was invited to become an Associate Member of the European Masters of Lexicography consortium (see https://www.emlex.phil.fau.eu/)
Start Year 2021
 
Description UFRGS & UNESP UK-Brazil collaboration 
Organisation Federal University of Rio Grande do Sul
Country Brazil 
Sector Academic/University 
PI Contribution We delivered 4 technology-assisted English academic writing workshops to support Brazilian researchers publishing internationally. The workshops were funded by the British Council, with matched funding from Sketch Engine, Santander Universities and the Brazilian Languages without Borders programme. They were held twice at the Federal University of Rio Grande do Sul (UFRGS, Porto Alegre), and another two times at São Paulo State University (UNESP, Sao Jose do Rio Preto), in April and June 2019. A total of 125 applicants participated, although demand for the workshops was more than twice the number of places we were able to offer. The participants included 72 researchers from a wide range of areas (e.g., Astronomy, Biology, Computer Science, Engineering, Politics, etc.) and at different points in their academic careers (from postgraduate research students to full professors), and 53 English tutors with different levels of teaching experience. By pairing up researchers and tutors, we aimed to encourage them to learn from each other. Researchers would benefit from having an English tutor sitting next to them to improve language awareness and ask questions, while English teachers would gain experience with research writing in fields they were unfamiliar with. The various technology-enhanced activities covered in the workshop included trialling a beta version of the ColloCaid prototype. Feedback collected via anonymous end-of-workshop questionnaires was very encouraging. The researchers were particularly happy to be able to use the workshop materials to enhance their own writing, and to have the just-in-time support of an English tutor sitting next to them. The tutors appreciated helping the researchers solve real problems, and being able to consult corpus tools and resources when they did not know the answer.
Collaborator Contribution Brazilian partners were responsible for the local organization of the workshops and contributed to their delivery.
Impact 1.Invited Presentation at " UK-BR Internationalisation and English Language Policies in Higher Education", London, 28 January 2020 2. Research Summary at: https://www.britishcouncil.org.br/sites/default/files/uk_collaboration_call_-_sarmentopintogarcia.pdf 3. British Council Language policy report at: https://www.britishcouncil.org.br/sites/default/files/report-uk-brazil-english-call-eng.pdf 4. Conference Paper: Frankenberg-Garcia, A., Sarmento, S., Tavares Pinto, P. & Bocorny, A. (2020) "A data-driven learning approach to supporting the dissemination of research from non-English speaking institutions" Full paper presented at the 14th Teaching and Language Corpora conference (TaLC 2020), 13-17 July, Perpignan, France. 5. Invited Presentation at "Internacionalização do Ensino Superior: políticas linguísticas" [Internationalization of Higher Education: language policies], British Council and Brazilian Ministry of Education, Brazil, 25 February 2021.
Start Year 2018
 
Description UFRGS & UNESP UK-Brazil collaboration 
Organisation Sao Paulo State University
Country Brazil 
Sector Academic/University 
PI Contribution We delivered 4 technology-assisted English academic writing workshops to support Brazilian researchers publishing internationally. The workshops were funded by the British Council, with matched funding from Sketch Engine, Santander Universities and the Brazilian Languages without Borders programme. They were held twice at the Federal University of Rio Grande do Sul (UFRGS, Porto Alegre), and another two times at São Paulo State University (UNESP, Sao Jose do Rio Preto), in April and June 2019. A total of 125 applicants participated, although demand for the workshops was more than twice the number of places we were able to offer. The participants included 72 researchers from a wide range of areas (e.g., Astronomy, Biology, Computer Science, Engineering, Politics, etc.) and at different points in their academic careers (from postgraduate research students to full professors), and 53 English tutors with different levels of teaching experience. By pairing up researchers and tutors, we aimed to encourage them to learn from each other. Researchers would benefit from having an English tutor sitting next to them to improve language awareness and ask questions, while English teachers would gain experience with research writing in fields they were unfamiliar with. The various technology-enhanced activities covered in the workshop included trialling a beta version of the ColloCaid prototype. Feedback collected via anonymous end-of-workshop questionnaires was very encouraging. The researchers were particularly happy to be able to use the workshop materials to enhance their own writing, and to have the just-in-time support of an English tutor sitting next to them. The tutors appreciated helping the researchers solve real problems, and being able to consult corpus tools and resources when they did not know the answer.
Collaborator Contribution Brazilian partners were responsible for the local organization of the workshops and contributed to their delivery.
Impact 1.Invited Presentation at " UK-BR Internationalisation and English Language Policies in Higher Education", London, 28 January 2020 2. Research Summary at: https://www.britishcouncil.org.br/sites/default/files/uk_collaboration_call_-_sarmentopintogarcia.pdf 3. British Council Language policy report at: https://www.britishcouncil.org.br/sites/default/files/report-uk-brazil-english-call-eng.pdf 4. Conference Paper: Frankenberg-Garcia, A., Sarmento, S., Tavares Pinto, P. & Bocorny, A. (2020) "A data-driven learning approach to supporting the dissemination of research from non-English speaking institutions" Full paper presented at the 14th Teaching and Language Corpora conference (TaLC 2020), 13-17 July, Perpignan, France. 5. Invited Presentation at "Internacionalização do Ensino Superior: políticas linguísticas" [Internationalization of Higher Education: language policies], British Council and Brazilian Ministry of Education, Brazil, 25 February 2021.
Start Year 2018
 
Title ColloCaid Prototype 
Description A prototype of the ColloCaid academic writing assistant. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact 3890 users signed up to use the tool since it became available in November 2019 
URL https://collocaid.uk/about
 
Description ColloCaid source code files and documentation 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
URL https://figshare.com/articles/software/ColloCaid_Code/14170988
 
Description Academic English Collocations: from corpus data to assisted writing. Webinar on Working with Corpora and Digital Tools in Language and Translation Studies: Practical Issues and Methodological Challenges, Universidad Complutense Madrid, Spain, 19 October 2020. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Around 50 people attended this webinar. The talk prompted a debate, several expressions of interest, and an uptick in users of the tool.
Year(s) Of Engagement Activity 2020
 
Description Ana Frankenberg-Garcia presented an overview of ColloCaid at the Formulaic Language Research Network (FlaRN) webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact On Friday July 9 2021 a Principal Investigator Ana Frankenberg-García gave a webinar on "Academic English Collocations: from corpus data to assisted writing" . The event was hosted by the Formulaic Language Research Network (FLaRN) and was attended by over 50 participants. there were a number of new registrations to the ColloCaid tool after the talk. Ana was invited to another talk at the University of Exter as a follow-up of this activity.
Year(s) Of Engagement Activity 2021
URL https://flarn.org.uk/
 
Description Assisting Writers with Academic English Collocations: lexical data, text editor integration and user feedback. Guest lecture for the European Masters in Lexicography, University Santiago de Compostela, Spain, 19 May 2020. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 15 students, practitioners, and academics attended this talk. It was well received and gave rise to discussion. It sparked expressions of interest in the ColloCaid project.
Year(s) Of Engagement Activity 2020
URL https://www.emlex.phil.fau.eu/2020/05/23/emlex-summerterm-a-modules-at-usc-2/
 
Description Assisting Writers with Academic English Collocations: lexical data, text editor integration and user feedback. Webinar celebrating the 50th anniversary of the Graduate Program in Applied Linguistics (LAEL), Pontifical Catholic University of Sao Paulo, Brazil, 23 May 2020. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Around 500 students (post- and undergraduate), language teachers, lexicographers, and researchers attended this talk. The talk was followed by a discussion and questions. It saw a significant increase in interest in the project and number of user of the ColloCaid tool.
Year(s) Of Engagement Activity 2020
URL https://www.youtube.com/watch?v=CZ9bKD37lhw&feature=youtu.be
 
Description COLLOCATION EDITOR AND VISUALISATION TOOL HITS 6000 USERS 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact News piece announcing 6000 users. The ColloCaid tool enables people to write texts and understand word associations and collocations. The online tool has now hit 6,000 users. Users come from academic institutions and businesses from across the world. Online on School Computer Science and Electronic Engineering Bangor University, news page.
Year(s) Of Engagement Activity 2022
URL https://www.bangor.ac.uk/scsee/news/collocation-editor-and-visualisation-tool-hits-6000-users
 
Description ColloCaid featured in the Cambridge World of Better Learning blog 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ColloCaid was feature in the widely read blog from Cambridge University Press. The tool received a large number of users who stated that they learnt about the tool through the blog.
Year(s) Of Engagement Activity 2020
URL https://www.cambridge.org/elt/blog/2020/07/31/find-words-that-collocate-why-how/
 
Description ColloCaid in Corpus-based Translation and Teaching Pedagogy. Invited webinar at São Paulo State University, 05 February 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 15 staff and students at UNESP attended this talk which provoked interest in how the project could influence the practice of teaching writing and translation. There were expressions of interests and requests for further information and collaboration.
Year(s) Of Engagement Activity 2021
 
Description ColloCaid: a text editor that helps writers with academic English Collocations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 50 people attended. Useful feedback was received from the hands-on demo.
Year(s) Of Engagement Activity 2019
URL http://www.clillac-arp.univ-paris-diderot.fr/_media/seminaires/labo/archives/frankenberg_re_sume_cli...
 
Description Collocaid at Wales Academic Symposium on Language Technologies 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Jonathan Roberts and Peter Butcher presented Collocaid at the Wales Academic Symposium on Language Technologies 2020, that took place on the 4th November 2020. 30 participants attended. The focus on the symposium was on Language Technologies, including Speech Technology and Translation Technology; Natural Language Processing; and Artificial Intelligence and Language. The work sparked questions of design and implementation of language technologies, and how they could be integrated into existing tools.
Year(s) Of Engagement Activity 2020
URL https://symposiwm2020.bangor.ac.uk/en
 
Description Collocaid.uk website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact At time of writing, the Collocaid website has received around one-thousand page views since its launch in June 2017. It has also resulted in numerous requests for further information and future participation.
Year(s) Of Engagement Activity 2017
URL http://www.collocaid.uk
 
Description Collocation and collaboration: Developing the ColloCaid writing assistant - presentation by Geraint Rees 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact On the 12th of April 2021 at 1400 GMT + 1, Geraint Rees gave a seminar talk entitled Collocation and collaboration: Developing the ColloCaid writing assistant. The seminar was hosted by the Infolex research group at the Institut de Lingüística Aplicada, Universitat Pompeu Fabra, Barcelona.
Year(s) Of Engagement Activity 2021
URL https://www.upf.edu/web/infolex/inici/-/asset_publisher/IAi8Upkr0GUL/content/id/244392699#.YiDr-ujP3...
 
Description Corpora for Editors. Seminar presented at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact As an expert in the field, Collocaid principal investigator Ana Frankenberg-Garcia was invited to present the seminar "Corpora for Editors" at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017. A considerable share of editing and proofreading work is devoted to polishing academic papers, dissertations and theses. Editors and proofreaders can contribute to the development of Collocaid by reporting the miscollocations they come across with in their day-to-day work. Collocaid will help editors and proofreaders detect collocation problems in the texts they revise and supply better collocation solutions.
Year(s) Of Engagement Activity 2017
URL https://www.sfep.org.uk/networking/conferences/
 
Description Data-driven learning from a text editor. Incorporating Corpora in Teaching Symposium, Mid-Sweden University, Sweden, 23 October 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This symposium was attended by international experts in the use of corpora in teaching, postgraduate students, and other interested parties. There were around 75 attendees in total. The talk about ColloCaid was very well received and generated requests for further information, collaboration, as well us an uptick in users of the tool. Several of the attendees said that they would use the tool in their teaching practice.
Year(s) Of Engagement Activity 2020
 
Description Demonstration and talks at Bangor University of ColloCaid 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact Talks and demonstrations of ColloCaid work at open days and visit days to Bangor University. Dates include: 9 October, 26 November. Each visit had about 25 members of the public.
Year(s) Of Engagement Activity 2022
 
Description Demonstration and talks at Bangor University of ColloCaid 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact Talks and demonstrations of ColloCaid work at open days and visit days to Bangor University. Dates include: 10 October, 31 October, 20 November 2021. Each visit had about 50 members of the public.
Year(s) Of Engagement Activity 2021
 
Description Design Workshop (Bangor) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact 3 PhD students at Bangor university took part in a focus group and gave feedback on the academic writing process and ColloCaid tool
Year(s) Of Engagement Activity 2019
 
Description Design Workshop (Surrey) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact A focus group of 11 UoS staff, and postgraduate students discussed the academic writing process and shared ideas about the design of academic writing assistants.
Year(s) Of Engagement Activity 2019
 
Description Developing ColloCaid, a Text Editor for Improving Vocabulary and Fluency of Academic Writing 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact The seminar was attended by around 50 people. The tool was positively received. A number of requests for further information came in.
Year(s) Of Engagement Activity 2019
URL https://events.manchester.ac.uk/event/event:odp-k03rzm2p-ynx94/ctis-seminar-developing-collocaid-a-t...
 
Description Developing a Text Editor to Help Writers with Academic English Collocations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact There were around 50 postgraduate students and staff in attendance. The talk led to a questions and discussion
Year(s) Of Engagement Activity 2019
URL http://talks.cam.ac.uk/talk/index/129694
 
Description Editing Matters 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Guest article for the Society for Editors and Proofreaders digital magazine Editing Matters: "How can corpora help editors and proofreaders?" (2018)
Year(s) Of Engagement Activity 2018
URL https://www.sfep.org.uk/resources/editing-matters/
 
Description Elex 2021 conference paper ""How useful are writing assistants to researchers with English as a Second Language? A review of existing tools" by Gustavo Zomer and Ana Frankenberg-Garcia 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gustavo Zomer and Ana Frankenberg-Garcia delivered the paper "How useful are writing assistants to researchers with English as a Second Language? A review of existing tools" - the review is part of a PhD project aiming at the development of a new writing assistant for academic writers. The presentation is available on the conference Youtube channel with 160 subscribers in March 2021.
Year(s) Of Engagement Activity 2021
URL https://elex.link/elex2021/
 
Description Elex 2021 conference paper "Measuring User Workload in e-Lexicography with the NASA Task Load Index" by Geraint Rees 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Geraint Rees presented an oral paper at eLex 2021 conference on "Measuring User Workload in e-Lexicography with the NASA Task Load Index". The conference was held online and attended by around 100 delegates from higher education, software companies and publishers. The presentation is available on the conference Youtube channel with 160 subscribers in March 2021.
Year(s) Of Engagement Activity 2021
URL https://elex.link/elex2021/
 
Description Elex 2021 conference paper "Taking a broad view of post-editing lexicography" by Ana Frankenberg-Garcia 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Ana Frankenberg-Garcia presented an oral paper at eLex 2021 conference on post-editing lexicography. The conference was held online and attended by around 100 delegates from higher education, software companies and publishers. The presentation is available on the conference YouTube channel with 160 subscribers in March 2021.
Year(s) Of Engagement Activity 2021
URL https://elex.link/elex2021/
 
Description Guest article for the ITI Bulletin: "Consulting corpora" (2018) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited to write a short introductory article on corpora and how they can help translators
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/more/news/1218-consulting-corpora
 
Description Institute of Translation and Interpreting webinar 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Around 70 translators attended this introductory workshop on using corpora to enhance translation practice. for many participants, this was the first time they were able to understand how corpus linguistics could be relevant to their work.
Year(s) Of Engagement Activity 2021
URL https://www.iti.org.uk/discover/events-calendar/french-network-using-corpora-to-enhance-translation-...
 
Description Invitation to present a talk at the Graduate School of Education Research Seminar at the University of Exeter 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Ana Frankenberg-Garcia gave an overview of ColloCaid at the Graduate School of Education Research Seminar, University of Exeter, 6 December 2021. Around 10 people attended, including tutors of academic English interested in using ColloCaid in their practice.
Year(s) Of Engagement Activity 2021
URL https://www.exeter.ac.uk/news/events/details/index.php?event=11561
 
Description Invited talk at the University of Exeter 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact xxx
Year(s) Of Engagement Activity 2022
 
Description Invited talk on ColloCaid at the Università degli Studi di Napoli 'L'Orientale' in Naples, Italy. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact I was invited to present the research behind the ColloCaid project and the tool itself to students at Università degli Studi di Napoli 'L'Orientale' as a featured guest speaker. The talk took place on October 25th, 2021 and was attended by students and staff both physically present in the classroom and online. The talk was 90 minutes and sparked a lot of interest.
Year(s) Of Engagement Activity 2021
 
Description OASIS summary of ReCALL 2019 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A lay summary of Frankenberg-Garcia, A. et al. (2019). Developing a writing assistant to help EAP writers with collocations in real time. ReCALL, 31(1), 23-39. to explain our research to the general public was published in oasis
Year(s) Of Engagement Activity 2019
URL https://oasis-database.org/?locale=en
 
Description Poznan Linguistic Meeting presentation: "ColloCaid: A Corpus-Based Writing Assistant for Academic English Collocation". 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Geraint Rees gave an overview of the ColloCaid project at the Poznan Linguistic Meeting in Poland.There were around 10 attendees from Poland.
Year(s) Of Engagement Activity 2021
URL http://wa.amu.edu.pl/plm/2022/PLM2021_Programme
 
Description Presentation at the University of Sao Paulo 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 10 professors and postgraduate students attended in person. The talk was recorded, with 112 views on 7 March 2024.
Year(s) Of Engagement Activity 2023
URL https://www.youtube.com/watch?v=EjqLQuniZVY
 
Description Talk at the second annual symposium of the Alan Turing Institute Visualization Interest Group (#VizTIG) 8 September 2020. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The second annual symposium of the Alan Turing Institute Visualization Interest Group (#VizTIG) was held virtually on the 8th September 2020. The fields of data science and artificial intelligence are generating outputs that are too complex for humans to understand in traditional ways. This symposium focused on ways that visualisation with data analysis can help users gain better insights into their data. Professor Roberts gave a talk on data-visualisation with examples from ColloCaid project. Professor Roberts spoke on idea of "data visualisation iceberg" and reflected on how to build data-visualisation projects including the ColloCaid project. 40 participants attended, and heard about principles from the ColloCaid project. The work sparked questions about the metaphor and the system.
Year(s) Of Engagement Activity 2020
URL https://www.turing.ac.uk/research/interest-groups/visualization
 
Description Talk on lexicography and Collocaid project 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact 15 PhD students and academic staff attended a talk on the Collocaid project, and lexicography, which sparked discussion on academic writing and especially writing tools, resulting in participants understanding the availability of different tools and techniques.
Year(s) Of Engagement Activity 2019
URL https://www.bangor.ac.uk/computer-science-and-electronic-engineering/news/peter-butcher-gives-a-semi...
 
Description Using ColloCaid for Academic Writing. BALEAP TELSIG webinar, 29 January 2021. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Around 200 people, primarily English tutors, attended this talk. There was a discussion about how the project might influence their professional practice and several request for further information and collaboration.
Year(s) Of Engagement Activity 2021
 
Description Workshop Academic writing in English: make your research texts more idiomatic and readable (Universidad de León) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This three day Academic writing workshop attracted around 30 participants including undergraduates, postgraduates and staff at the Universidad de León, Spain. The feedback received was positive. My participants reported having reflected on their writing practices.
Year(s) Of Engagement Activity 2019
 
Description Workshop:Improve your translation with the help of corpora 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This hands-on workshop was aimed at practising and new translators who wished to understand how corpora and related tool such as ColloCaid can be used as an aid to translation. Several expressions of interest in the ColloCaid tool were received.
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/professional-development/events-calendar/icalrepeat.detail/2019/02/08/13420/-...