Collocaid: combining learner needs, lexicographic data and text editors to help learners write more idiomatically

Lead Research Organisation: University of Surrey
Department Name: English

Abstract

Over the past decades, the UK has produced a series of world-leading corpus-based pedagogical dictionaries that provide users not just with the definitions of words, but also with a wealth of information on how words are actually used in context. There have also been considerable advances with regard to dictionary format. Nowadays, all major English language dictionaries have digital interfaces. Yet research on dictionary use shows that the spectacular developments in terms of dictionary content and format that have taken place over the past decades have not had a dramatic influence on actual dictionary-user behaviour. Dictionaries - both paper-based and digital - remain by and large underused, and it is widely acknowledged that more needs to be done with regard to teaching people how to use dictionaries to their full potential. This proposal stems from the realization that an arguably better solution would be to develop alternative, dictionary-like tools that do not require much in the way of training or instructions.

This project aims to research how information to help writers produce more accurate and idiomatic texts can be migrated from dictionaries and corpora to digital writing environments in an optimum, minimally intrusive way, without disrupting writing processes. Rather than attempting to cover every possible aspect of writing, we will focus on supporting non-native speakers of English with information to help them deal with collocation. Violating collocation conventions can result in errors (e.g. *They trust in us) or awkward, non-idiomatic text (e.g. *a large difference). Additionally, writers who are unable to retrieve idiomatic collocates (e.g. a narrow/daring/lucky escape) often make do with bland, less interesting alternatives (e.g. a fantastic escape). Although there are dictionaries that focus precisely on collocation, writers are often unaware of them or simply cannot be bothered to use them. Moreover, the simple fact that learners have to stop writing to look up a collocate can disrupt the flow of their words. It is in this context that we propose to research how writers can retrieve information on collocation directly from within digital writing environments in an intuitive and minimally intrusive way so that (1) writers do not need to be trained to look up this information and (2) the flow of writing is not disrupted in the process.

The research will begin with a needs analysis to identify which collocation difficulties to focus on. We will then carry out lexicographic work to address those needs, using, among other resources, computerized language corpora and state-of-the-art lexicographic tools. Next, we will research how to integrate information on collocation with text editors in an easy, helpful and minimally disruptive way. Different models of human-computer interaction and data visualization will be developed and the team will carry out usability studies and test them with a sample of the target population.

The investigators responsible for this project are three well-known academics with many years of teaching and research experience in the fields of second language writing, lexicography, corpus linguistics and human-computer interaction. The team's advisory board counts with Michael Rundell (editor-in-chief of Macmillan Dictionaries), Pete Whitelock (principal language engineer at Oxford University Press dictionary division) and Milos Jakubicek (CEO of Lexical Computing Ltd).

This research will contribute to further the UK's reputation of world-leading developments in the field of pedagogical lexicography. The project has tangible impacts on society, culture and the economy, as its outputs include data and software that can help writers using English as a medium of communication. We will be exploiting the potential of digital technologies to enhance the creation of knowledge through writing, enabling people of different backgrounds to better express themselves in written English.

Planned Impact

In addition to the academic beneficiaries, the present project will generate tangible outputs with a potential to impact society, culture and the economy. There are a number of non-academic stakeholders at a national and international level who can benefit from this. At first instance, these include but are not limited to the following:

a. Writers using English as a medium of communication, especially non-native writers of English (e.g. undergraduate and postgraduate students as well as researchers and lecturers in the UK and abroad, in addition to wider audiences including politicians, journalists and other professionals who need to communicate in written English), will benefit from the development of a user-friendly digital writing environment that can help them produce more grammatical and idiomatic texts.

b. Native English speakers wishing to develop further writing skills (this could include children, students and professionals less fluent in writing) could benefit in similar ways as the beneficiaries in (a).

c. English as a Foreign Language (EFL) and English for Academic Purposes (EAP) tutors in the UK and abroad will have new resources to draw on. They will be welcome to use the information collected on collocation difficulties and collocation solutions in their day-to-day teaching practice. While the primary data generated by the project will be made easily accessible to them through the project website, this group can also benefit from the edited tools and resources developed by group (d) below.

d. The collocation data generated by this project can be commercially valuable to academic publishers producing EAP materials such as Oxford University Press, Cambridge University Press and Pearson ELT, and English language testing services like Cambridge Language Assessment, IELTS and TOEFL. This data can be used to develop books, interactive online exercises and tests. The edited materials and resources they produce using our data will further benefit groups (a) and (b) above and (c) above.

e. Software developers will benefit by having novel visualization methods that focus on personal data. Personal visualization is a fast-growing area, and as of yet there are few techniques to interactively display personal textual data dynamically and interactively.

f. The linguistic tools and resources created for English in this project can have an indirect impact on other languages, fostering the development of similar projects for languages other than English.

In short, the outputs of the present proposal can have a strong societal, economic and cultural impact, with benefits not only to special professional and practitioner groups but also the wider public. By using technology to foster improved writing and by enabling people of different cultural and language backgrounds to better express themselves in written language, we hope to enhance the creation of knowledge and promote greater understanding and communication among different communities.
 
Title Video/animation on Visualisation and graphical techniques to help writers write more idiomatically 
Description The animation explains how visualisation can help authors, it provides a visual animation that explains and provides an overview of the project. 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact Has a wide reach; it is located within the IEEE VTGC community. 
URL https://vimeo.com/230838396
 
Description We researched which academic words were the most important ones across academic disciplines by cross-referencing three well-known academic word lists. We identified 489 essential nouns, verbs and adjectives that overlapped in at least two lists. This included words typically used across academic disciplines like "research", "system", "contribute", "suggest","critical", "significant", etc

Using lexical computing software, we analysed millions of words in texts by expert writers of academic English to find out what word combinations were typically used with our selected words. We identified thousands of academic word combinations like "significantly improve", "quantitative research", "design a system", etc.

We also extracted example sentences to help people see how these academic word combinations have been used in expert writing.For example: "the advantages of designing a system in this way"; "a poorly designed system"; "a system designed to..".

We planned ways in which information on how academic words combine would be presented in the ColloCaid text editor based on previous research on writing and dictionary use.
Our analysis led us to establish the six principles of the text editor we are devoping: (a) raise awareness of conventional combinations of words writers may not remember to look up; (b) suggest solutions for problems like "a great level" instead of "a high level" ; (c) provide cues in an intuitive way; (d) ensure cues are unobtrusive; (c) enable writers to retrieve cues as and when needed; (f) enable default settings to be adjusted to individual needs.
Exploitation Route Our findings can be used to teach and improve academic writing in English. Researchers who are not used to writing up their research in English and students who are not used to academic English will benefit.
Sectors Digital/Communication/Information Technologies (including Software),Education,Other

URL http://www.collocaid.uk
 
Description ColloCaid has received several requests to be used by real-world users and incorporated in academic writing programmes (e.g. in Japan, Australia, Spain, New Zealand, Brazil). At this stage in the research, ColloCaid has been released for beta testing by experts only. We envisage to test the tool with real-world users in the forthcoming year. The Slovenian dictionary has also expressed interest in collaborating with the ColloCaid team to develop a similar infrastructure for the Slovenian language.
Sector Education
Impact Types Cultural,Societal

 
Description British Council UK Brazil Collaboration Call
Amount £10,000 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2019 
End 08/2019
 
Description Santander Staff Mobility Award
Amount £2,000 (GBP)
Organisation Santander Universities 
Sector Private
Country United Kingdom
Start 05/2018 
End 05/2018
 
Description Collocaid.uk website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact At time of writing, the Collocaid website has received around one-thousand page views since its launch in June 2017. It has also resulted in numerous requests for further information and future participation.
Year(s) Of Engagement Activity 2017
URL http://www.collocaid.uk
 
Description Corpora for Editors. Seminar presented at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact As an expert in the field, Collocaid principal investigator Ana Frankenberg-Garcia was invited to present the seminar "Corpora for Editors" at the 28th Society for Editors and Proofreaders Conference, Wyboston Lakes, 16-18 September 2017. A considerable share of editing and proofreading work is devoted to polishing academic papers, dissertations and theses. Editors and proofreaders can contribute to the development of Collocaid by reporting the miscollocations they come across with in their day-to-day work. Collocaid will help editors and proofreaders detect collocation problems in the texts they revise and supply better collocation solutions.
Year(s) Of Engagement Activity 2017
URL https://www.sfep.org.uk/networking/conferences/
 
Description Editing Matters 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Guest article for the Society for Editors and Proofreaders digital magazine Editing Matters: "How can corpora help editors and proofreaders?" (2018)
Year(s) Of Engagement Activity 2018
URL https://www.sfep.org.uk/resources/editing-matters/
 
Description Guest article for the ITI Bulletin: "Consulting corpora" (2018) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited to write a short introductory article on corpora and how they can help translators
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/more/news/1218-consulting-corpora
 
Description OASIS summary of ReCALL 2019 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A lay summary of Frankenberg-Garcia, A. et al. (2019). Developing a writing assistant to help EAP writers with collocations in real time. ReCALL, 31(1), 23-39. to explain our research to the general public was published in oasis
Year(s) Of Engagement Activity 2019
URL https://oasis-database.org/?locale=en
 
Description Workshop:Improve your translation with the help of corpora 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This hands-on workshop was aimed at practising and new translators who wished to understand how corpora and related tool such as ColloCaid can be used as an aid to translation. Several expressions of interest in the ColloCaid tool were received.
Year(s) Of Engagement Activity 2018
URL https://www.iti.org.uk/professional-development/events-calendar/icalrepeat.detail/2019/02/08/13420/-...