Collocaid: combining learner needs, lexicographic data and text editors to help learners write more idiomatically

Lead Research Organisation: University of Surrey
Department Name: English

Abstract

Over the past decades, the UK has produced a series of world-leading corpus-based pedagogical dictionaries that provide users not just with the definitions of words, but also with a wealth of information on how words are actually used in context. There have also been considerable advances with regard to dictionary format. Nowadays, all major English language dictionaries have digital interfaces. Yet research on dictionary use shows that the spectacular developments in terms of dictionary content and format that have taken place over the past decades have not had a dramatic influence on actual dictionary-user behaviour. Dictionaries - both paper-based and digital - remain by and large underused, and it is widely acknowledged that more needs to be done with regard to teaching people how to use dictionaries to their full potential. This proposal stems from the realization that an arguably better solution would be to develop alternative, dictionary-like tools that do not require much in the way of training or instructions.

This project aims to research how information to help writers produce more accurate and idiomatic texts can be migrated from dictionaries and corpora to digital writing environments in an optimum, minimally intrusive way, without disrupting writing processes. Rather than attempting to cover every possible aspect of writing, we will focus on supporting non-native speakers of English with information to help them deal with collocation. Violating collocation conventions can result in errors (e.g. *They trust in us) or awkward, non-idiomatic text (e.g. *a large difference). Additionally, writers who are unable to retrieve idiomatic collocates (e.g. a narrow/daring/lucky escape) often make do with bland, less interesting alternatives (e.g. a fantastic escape). Although there are dictionaries that focus precisely on collocation, writers are often unaware of them or simply cannot be bothered to use them. Moreover, the simple fact that learners have to stop writing to look up a collocate can disrupt the flow of their words. It is in this context that we propose to research how writers can retrieve information on collocation directly from within digital writing environments in an intuitive and minimally intrusive way so that (1) writers do not need to be trained to look up this information and (2) the flow of writing is not disrupted in the process.

The research will begin with a needs analysis to identify which collocation difficulties to focus on. We will then carry out lexicographic work to address those needs, using, among other resources, computerized language corpora and state-of-the-art lexicographic tools. Next, we will research how to integrate information on collocation with text editors in an easy, helpful and minimally disruptive way. Different models of human-computer interaction and data visualization will be developed and the team will carry out usability studies and test them with a sample of the target population.

The investigators responsible for this project are three well-known academics with many years of teaching and research experience in the fields of second language writing, lexicography, corpus linguistics and human-computer interaction. The team's advisory board counts with Michael Rundell (editor-in-chief of Macmillan Dictionaries), Pete Whitelock (principal language engineer at Oxford University Press dictionary division) and Milos Jakubicek (CEO of Lexical Computing Ltd).

This research will contribute to further the UK's reputation of world-leading developments in the field of pedagogical lexicography. The project has tangible impacts on society, culture and the economy, as its outputs include data and software that can help writers using English as a medium of communication. We will be exploiting the potential of digital technologies to enhance the creation of knowledge through writing, enabling people of different backgrounds to better express themselves in written English.

Planned Impact

In addition to the academic beneficiaries, the present project will generate tangible outputs with a potential to impact society, culture and the economy. There are a number of non-academic stakeholders at a national and international level who can benefit from this. At first instance, these include but are not limited to the following:

a. Writers using English as a medium of communication, especially non-native writers of English (e.g. undergraduate and postgraduate students as well as researchers and lecturers in the UK and abroad, in addition to wider audiences including politicians, journalists and other professionals who need to communicate in written English), will benefit from the development of a user-friendly digital writing environment that can help them produce more grammatical and idiomatic texts.

b. Native English speakers wishing to develop further writing skills (this could include children, students and professionals less fluent in writing) could benefit in similar ways as the beneficiaries in (a).

c. English as a Foreign Language (EFL) and English for Academic Purposes (EAP) tutors in the UK and abroad will have new resources to draw on. They will be welcome to use the information collected on collocation difficulties and collocation solutions in their day-to-day teaching practice. While the primary data generated by the project will be made easily accessible to them through the project website, this group can also benefit from the edited tools and resources developed by group (d) below.

d. The collocation data generated by this project can be commercially valuable to academic publishers producing EAP materials such as Oxford University Press, Cambridge University Press and Pearson ELT, and English language testing services like Cambridge Language Assessment, IELTS and TOEFL. This data can be used to develop books, interactive online exercises and tests. The edited materials and resources they produce using our data will further benefit groups (a) and (b) above and (c) above.

e. Software developers will benefit by having novel visualization methods that focus on personal data. Personal visualization is a fast-growing area, and as of yet there are few techniques to interactively display personal textual data dynamically and interactively.

f. The linguistic tools and resources created for English in this project can have an indirect impact on other languages, fostering the development of similar projects for languages other than English.

In short, the outputs of the present proposal can have a strong societal, economic and cultural impact, with benefits not only to special professional and practitioner groups but also the wider public. By using technology to foster improved writing and by enabling people of different cultural and language backgrounds to better express themselves in written language, we hope to enhance the creation of knowledge and promote greater understanding and communication among different communities.

Publications

10 25 50
 
Title Introducing ColloCaid 
Description A video introducing ColloCaid 
Type Of Art Film/Video/Animation 
Year Produced 2019 
Impact Over 250 views 
URL https://vimeo.com/361811792
 
Title Video/animation on Visualisation and graphical techniques to help writers write more idiomatically 
Description The animation explains how visualisation can help authors, it provides a visual animation that explains and provides an overview of the project. 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact Has a wide reach; it is located within the IEEE VTGC community. 
URL https://vimeo.com/230838396
 
Description We researched which academic words were the most important ones across academic disciplines by cross-referencing three well-known academic word lists. We identified 489 essential nouns, verbs and adjectives that overlapped in at least two lists. This included words typically used across academic disciplines like "research", "system", "contribute", "suggest","critical", "significant", etc.

While researching the above core academic words, we have identified important omissions from academic word lists. For example, "table" is an essential academic word, but doesn't figure prominently in certain academic word lists. One reason is because it shares its academic sense (e.g., "see Table 5") with very frequent non-academic senses (e.g. "the keys are on the table"). It has also become evident that words that are used across more than one discipline are not necessarily interdisciplinary, but rather are used differently in different subjects. For example, "code" has different meanings and collocations in Computer Science, Biology and Linguistics. In response to these developments, we have expanded our list of core nouns, verbs and adjectives to 557 lemmas in 702 senses.

Using lexical computing software, we analysed millions of words in texts by expert writers of academic English to find out what collocations (word combinations) were typically used with the above core academic words. We have identified 31,927 frequently academic collocations like "significantly improve", "quantitative research", "design a system", etc. and incorporated them into a lexical database of academic collocation suggestions.

We also extracted authentic sentences of how these academic collocations have been used in expert writing and used them to compile a database of 30,203 curated phrases to help writers notice how collocations are used in context. For the collocation "design + system", for instance, we provide short, easily readable examples like "the advantages of designing a system in this way"; "a poorly designed system"; "a system designed to..", and so on.

We planned the ways our lexical database of collocations and examples of collocations in use would be presented to academic writers based on previous research on writing, dictionary use, human-computer interaction and visualisation. Our analysis led us to:
1. Develop a text editor from which collocations suggestions could be consulted in such a way that writers don't have to leave their writing environment. Unlike checking collocations in dictionaries and other external references, this is less likely to interrupt the writing process and word flow.
2. Incorporate collocation suggestions in such a way that writers can be prompted to expand their academic writing vocabulary even when they are not aware of their vocabulary limitations.
3. Provide collocation suggestions in an intuitive and unobtrusive way, so that writers can use them only when and as needed.

We have tested a prototype of the ColloCaid editor with various groups of target users, including students, experienced researchers and English teachers. Version 0.3 of the prototype was rated nearly excellent (79.9% and 84.2%) in the widely used System Usability Scale questionnaire, based on anonymoous questionnaire given to users in Brazil (42 responses) and Spain (18 responses). Additionally respondents gave the following ratings to more specific questions about ColloCaid (on a sacale of 1-5):
How useful is ColloCaid to confirm that the ways I combine words in academic English is appropriate? 4.3 (Brazil & Spain)
How useful is ColloCaid to remind myself of academic English word combinations I already knew? 4.2 (Brazil) 4.6 (Spain)
How useful is ColloCaid to learn new academic English word combinations? 4.6 (Brazil) 4.7 (Spain)

What did you like about ColloCaid? (selected representative responses from Brazil and Spain)
It's really easy and practical to use;
Easier to get access to collocations while you are writing the text, you do not need to leave the place you are writing to check in another place the collocation;
It shows a diverse of contexts in which the colllocations are used; It is easy to use and gives quick information;
It's a simple and intuitive tool;
I like it when it gives only 1 example to begin with and then later you can get more, if you need more;
It reminds me with many collocations that I knew but they did not come to my mind at glance

What could be improved in ColloCaid?(selected representative responses from Brazil and Spain)
More words could be added to ColloCaid, like academic words from specific areas;
Maybe it could be compatible with text editors we use daily, such as Microsoft Word;
Install a auto saving mechanism so that the text that I am composing is not damage or lost;
The interface / appearance could be more appealing but that is not a really important issue and I guess it is the last thing to improve

The ColloCaid prototype is a proof of concept. In the final stages of our research, we aim to
Address known bugs;
Develop alternative visualization solutions;
Conduct more usability tests;
Explore future partnerships.
Exploitation Route Our findings can be used to teach and improve academic writing in English. Researchers who are not used to writing up their research in English and students who are not used to academic English will benefit.

Our research has also contributed to the advancement of corpus-based lexicography and writing assistants.
Sectors Digital/Communication/Information Technologies (including Software),Education,Other

URL http://www.collocaid.uk
 
Description ColloCaid has received several requests to be used by real-world users and incorporated in academic writing programmes (e.g. in Japan, Australia, Spain, New Zealand, Brazil). The ColloCaid prototype was first open to the public recently, in October 2019. Because it is still under development, we have not advertised it beyond a few isolated presentations at seminars and conferences. Our latest figures (5 March 2020) indicate that the tool has nevertheless attracted 257 registered users from 44 different countries, especially the UK (57 users), Brazil (36 users), Poland (23 users) and Saudi Arabia (20 users). Users include University lecturers/professors (23.7%), Master Students (22.6%), PhD Students (17.5%), Research Fellows (8.2%), English teachers (7.0%), Undergraduate students (5.8%), Secondary School Students (1.2%) and Other (14.0%). Our recently published research papers are beginning to be cited in other publications.
First Year Of Impact 2019
Sector Education
Impact Types Cultural,Societal

 
Description Technology-enhanced research writing - Brazil
Geographic Reach South America 
Policy Influence Type Influenced training of practitioners or researchers
Impact 53 English tutors supporting academic writing in Brazil and 72 postgraduate research students and academics received training in corpus linguistics technologies to support research wroting and the internationalization of Brazilian research.
URL https://www.britishcouncil.org.br/sites/default/files/uk_collaboration_call_-_sarmentopintogarcia.pd...
 
Description Technology-enhanced research writing - Spain
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
Impact Postgraduate research students and academics received training in corpus linguistics technologies to support research writing for international, peer-reviewed journals
 
Description British Council UK Brazil Collaboration Call
Amount £10,000 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2019 
End 08/2019
 
Description Santander Staff Mobility Award
Amount £2,000 (GBP)
Organisation Santander Universities 
Sector Private
Country United Kingdom
Start 05/2018 
End 05/2018
 
Title ColloCaid lexical database 
Description We have compiled a corpus-based lexical database to support the ColloCaid text editor. The lexicographic database underlying ColloCaid includes at this time: 557 lemmas in 702 senses; 31,927 non-discipline-specific collocations extracted from corpora of expert academic writing; 30,203 curated corpus examples of core collocations in context 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Our database serves the ColloCaid editor to help academic writers find collocation suggestions to improve the idiomaticity of their texts. We have at this time 257 registered users of the ColloCaid prototype. 
URL http://www.collocaid.uk
 
Description Leon Workshop 
Organisation University of Leon
Country Spain 
Sector Academic/University 
PI Contribution We delivered a one-week technology-enhanced writing workshop to support Spanish academics publishing in English at the University of Leon, Spain, in June 2019. A beta version of the ColloCaid prototype was trialled during the workshop.
Collaborator Contribution University of Leon partner was responsible for local organization and helped to deliver the workshop.
Impact Workshop participants received research writing support.
Start Year 2019
 
Description UFRGS & UNESP UK-Brazil collaboration 
Organisation Federal University of Rio Grande do Sul
Country Brazil 
Sector Academic/University 
PI Contribution We delivered 4 technology-assisted English academic writing workshops to support Brazilian researchers publishing internationally. The workshops were funded by the British Council, with matched funding from Sketch Engine, Santander Universities and the Brazilian Languages without Borders programme. They were held twice at the Federal University of Rio Grande do Sul (UFRGS, Porto Alegre), and another two times at São Paulo State University (UNESP, Sao Jose do Rio Preto), in April and June 2019. A total of 125 applicants participated, although demand for the workshops was more than twice the number of places we were able to offer. The participants included 72 researchers from a wide range of areas (e.g., Astronomy, Biology, Computer Science, Engineering, Politics, etc.) and at different points in their academic careers (from postgraduate research students to full professors), and 53 English tutors with different levels of teaching experience. By pairing up researchers and tutors, we aimed to encourage them to learn from each other. Researchers would benefit from having an English tutor sitting next to them to improve language awareness and ask questions, while English teachers would gain experience with research writing in fields they were unfamiliar with. The various technology-enhanced activities covered in the workshop included trialling a beta version of the ColloCaid prototype. Feedback collected via anonymous end-of-workshop questionnaires was very encouraging. The researchers were particularly happy to be able to use the workshop materials to enhance their own writing, and to have the just-in-time support of an English tutor sitting next to them. The tutors appreciated helping the researchers solve real problems, and being able to consult corpus tools and resources when they did not know the answer.
Collaborator Contribution Brazilian partners were responsible for the local organization of the workshops and contributed to their delivery.
Impact Presentation at " UK-BR Internationalisation and English Language Policies in Higher Education", London, 28 January 2020
Start Year 2018
 
Description UFRGS & UNESP UK-Brazil collaboration 
Organisation Sao Paulo State University
Country Brazil 
Sector Academic/University 
PI Contribution We delivered 4 technology-assisted English academic writing workshops to support Brazilian researchers publishing internationally. The workshops were funded by the British Council, with matched funding from Sketch Engine, Santander Universities and the Brazilian Languages without Borders programme. They were held twice at the Federal University of Rio Grande do Sul (UFRGS, Porto Alegre), and another two times at São Paulo State University (UNESP, Sao Jose do Rio Preto), in April and June 2019. A total of 125 applicants participated, although demand for the workshops was more than twice the number of places we were able to offer. The participants included 72 researchers from a wide range of areas (e.g., Astronomy, Biology, Computer Science, Engineering, Politics, etc.) and at different points in their academic careers (from postgraduate research students to full professors), and 53 English tutors with different levels of teaching experience. By pairing up researchers and tutors, we aimed to encourage them to learn from each other. Researchers would benefit from having an English tutor sitting next to them to improve language awareness and ask questions, while English teachers would gain experience with research writing in fields they were unfamiliar with. The various technology-enhanced activities covered in the workshop included trialling a beta version of the ColloCaid prototype. Feedback collected via anonymous end-of-workshop questionnaires was very encouraging. The researchers were particularly happy to be able to use the workshop materials to enhance their own writing, and to have the just-in-time support of an English tutor sitting next to them. The tutors appreciated helping the researchers solve real problems, and being able to consult corpus tools and resources when they did not know the answer.
Collaborator Contribution Brazilian partners were responsible for the local organization of the workshops and contributed to their delivery.
Impact Presentation at " UK-BR Internationalisation and English Language Policies in Higher Education", London, 28 January 2020
Start Year 2018
 
Title ColloCaid Prototype 
Description A prototype of the ColloCaid academic writing assistant. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact over 250 users signed up to use the tool since it became available in November 2019 
URL https://collocaid.uk/about
 
Description ColloCaid: a text editor that helps writers with academic English Collocations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 50 people attended. Useful feedback was received from the hands-on demo.
Year(s) Of Engagement Activity 2019
URL http://www.clillac-arp.univ-paris-diderot.fr/_media/seminaires/labo/archives/frankenberg_re_sume_cli...
 
Description Collocaid.uk website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact At time of writing, the Collocaid website has received around one-thousand page views since its launch in June 2017. It has also resulted in numerous requests for further information and future participation.
Year(s) Of Engagement Activity 2017
URL http://www.collocaid.uk
 
Description Corpora for Editors. Seminar presented at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact As an expert in the field, Collocaid principal investigator Ana Frankenberg-Garcia was invited to present the seminar "Corpora for Editors" at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017. A considerable share of editing and proofreading work is devoted to polishing academic papers, dissertations and theses. Editors and proofreaders can contribute to the development of Collocaid by reporting the miscollocations they come across with in their day-to-day work. Collocaid will help editors and proofreaders detect collocation problems in the texts they revise and supply better collocation solutions.
Year(s) Of Engagement Activity 2017
URL https://www.sfep.org.uk/networking/conferences/
 
Description Design Workshop (Bangor) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact 3 PhD students at Bangor university took part in a focus group and gave feedback on the academic writing process and ColloCaid tool
Year(s) Of Engagement Activity 2019
 
Description Design Workshop (Surrey) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact A focus group of 11 UoS staff, and postgraduate students discussed the academic writing process and shared ideas about the design of academic writing assistants.
Year(s) Of Engagement Activity 2019
 
Description Developing ColloCaid, a Text Editor for Improving Vocabulary and Fluency of Academic Writing 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact The seminar was attended by around 50 people. The tool was positively received. A number of requests for further information came in.
Year(s) Of Engagement Activity 2019
URL https://events.manchester.ac.uk/event/event:odp-k03rzm2p-ynx94/ctis-seminar-developing-collocaid-a-t...
 
Description Developing a Text Editor to Help Writers with Academic English Collocations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact There were around 50 postgraduate students and staff in attendance. The talk led to a questions and discussion
Year(s) Of Engagement Activity 2019
URL http://talks.cam.ac.uk/talk/index/129694
 
Description Editing Matters 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Guest article for the Society for Editors and Proofreaders digital magazine Editing Matters: "How can corpora help editors and proofreaders?" (2018)
Year(s) Of Engagement Activity 2018
URL https://www.sfep.org.uk/resources/editing-matters/
 
Description Guest article for the ITI Bulletin: "Consulting corpora" (2018) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited to write a short introductory article on corpora and how they can help translators
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/more/news/1218-consulting-corpora
 
Description OASIS summary of ReCALL 2019 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A lay summary of Frankenberg-Garcia, A. et al. (2019). Developing a writing assistant to help EAP writers with collocations in real time. ReCALL, 31(1), 23-39. to explain our research to the general public was published in oasis
Year(s) Of Engagement Activity 2019
URL https://oasis-database.org/?locale=en
 
Description Talk on lexicography and Collocaid project 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact 15 PhD students and academic staff attended a talk on the Collocaid project, and lexicography, which sparked discussion on academic writing and especially writing tools, resulting in participants understanding the availability of different tools and techniques.
Year(s) Of Engagement Activity 2019
URL https://www.bangor.ac.uk/computer-science-and-electronic-engineering/news/peter-butcher-gives-a-semi...
 
Description Workshop Academic writing in English: make your research texts more idiomatic and readable (Universidad de León) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This three day Academic writing workshop attracted around 30 participants including undergraduates, postgraduates and staff at the Universidad de León, Spain. The feedback received was positive. My participants reported having reflected on their writing practices.
Year(s) Of Engagement Activity 2019
 
Description Workshop:Improve your translation with the help of corpora 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This hands-on workshop was aimed at practising and new translators who wished to understand how corpora and related tool such as ColloCaid can be used as an aid to translation. Several expressions of interest in the ColloCaid tool were received.
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/professional-development/events-calendar/icalrepeat.detail/2019/02/08/13420/-...