British National Corpus (BNC) as a sociolinguistic dataset: Exploring individual and social variation

Lead Research Organisation: Lancaster University

Department Name: Linguistics and English Language

Abstract

The project exploits an existing dataset, The British National Corpus (BNC), for the study of informal spoken British English as used by different age and social groups across the UK. In addition, new developments in British English will be investigated by comparing the BNC with BNC2014, a new dataset that is being developed at Lancaster University in collaboration with Cambridge University Press. This allows us, for the first time, to look at language change in spoken British English, on a large scale, over twenty years. By combing methodologies from the fields of corpus linguistics and sociolinguistics as well as using novel analytical methods for in-depth exploration of the data, the project will offer new insights into social variation in British English that have previously not been possible. The focus of the sociolinguistic analyses will be on age, an important aspect of everyday social life, that has so far received only limited attention from researchers studying language.

The main contribution of the project is not only to our knowledge of British English but also to enabling future systematic research in this area. The results of the project will be applied in teaching of the English language at secondary schools (AS and A-level) and in ESL/EFL classes to students whose mother tongue is not English. Internationally, there is a growing demand for EFL/ESL teaching, which also represents an important part of British economy. The results of the project will also be disseminated via a free online course (our Corpus Linguistics MOOC) run by the ESRC Centre for Corpus Approaches to Social Science, Lancaster University as well as via different channels of the project partners (project ambassadors). The project has been endorsed by Cambridge University Press, a leading global academic publisher and part of the University of Cambridge, the English and Media Centre, an important educational charity working with secondary teachers of English Language and Media Studies in the UK and abroad and Trinity College London, a major international testing board operating in over 40 countries worldwide.

Planned Impact

Our project will have five principal audiences for impact, established researchers in different disciplines, postgraduate researchers, language testers/material developers, non-academics in the UK and non-academics worldwide.
Who - established academics: The proposed research will have interdisciplinary impact. Apart from researchers in sociolinguistics and corpus linguistics, researchers investigating different aspects of British English and society (sociologists, social psychologists, educators etc.) will directly benefit from the findings of the research (see Academic beneficiaries).
How: Age is an important variable in social research especially in connection with the overall aging of society. To maximise impact, an online platform for searching the data and carrying out multi-variate analyses (advanced mode) will be developed. In this way, researchers will be able to test their own hypotheses about language and society using the subset of the BNC developed for the secondary data analysis (see Objectives).
Who - Master's and PhD students and junior researchers from the UK and overseas: who every year participate in free Lancaster Summer Schools in Corpus Linguistics; future cohorts from these summer schools will have an opportunity to engage with the findings of the study and use the techniques in their own research. Since 2011, the summer schools have attracted almost 400 participants from over 15 countries.
How: The Summer school workshops will include training in corpus linguistics and quantitative sociolinguistic techniques using the online platform developed in the project. The participants will be able to use the findings of the study as well as to test their own sociolinguistic hypotheses and include these in their theses/dissertations and academic publications.
Who - language testers/material developers: The project has been supported by three large UK organisations - CUP, EMC and TCL (see Pathways to Impact) which are involved in materials development/language testing.
How: Our impact partners will not only provide channels for the dissemination of the results but will also gain early access to these results. This will allow them to incorporate the findings into their own material development/language testing activities. This will give them an advantage on the competitive international market.
Who - non-academic beneficiaries in the UK: who fall into two groups i) secondary level teachers/students of English Language and ii) teachers/learners of English as a second/foreign language. The sociolinguistic perspective has been given a prominent position in the syllabi and assessment of English Language taught at AS and A-level by several major examination boards. In addition, for the acquisition of English as a second language, the awareness of what language is appropriate/typical for different situations or groups of speakers is very important (see Pathways to Impact).
How: The project will represent a valuable educational resource in both AS and A-level English Language classes as well as ESL/EFL classes through materials and an interactive online platform (simple mode) that will allow teachers and students easy access to sociolinguistic data in a user-friendly manner. The teaching materials will be deposited at the ESRC resources archive Social Science for Schools.
Who - non-academic audiences worldwide: Finally, via our successful free online course (Corpus Linguistics MOOC) the research findings will be disseminated to a large international audience (of academics but also importantly of non-academics). The course has attracted over 30,000 participants worldwide in its three runs in 2014 and 2015.
How: The team will develop a sociolinguistics module which will summarise the findings of the research and allow the audience to actively engage with it. The module will also provide links to the online platform and the teaching materials that will be deposited at the ESRC resources archive.

Funded Value:

£156,127

Funded Period:

Jan 17 - Jul 18

Funder:

ESRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

ES/P001599/1

Principal Investigator:

Vaclav Brezina

Research Subject:

Linguistics (99%)

Research Topic:

Corpus Linguistics (33%)

Sociolinguistics (66%)

Organisations

People	ORCID iD
Vaclav Brezina (Principal Investigator)	http://orcid.org/0000-0002-1613-6100
Dana Gablasova (Co-Investigator)
Tony McEnery (Co-Investigator)	http://orcid.org/0000-0002-8425-6403
Miriam Meyerhoff (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Brezina (2018) Statistics in Corpus Linguistics: A Practical Guide

Brezina V (2022) Corpus-based approaches to spoken L2 production Evidence from the Trinity Lancaster Corpus in International Journal of Learner Corpus Research

Brezina V (2021) The Written British National Corpus 2014 - design and comparability in Text & Talk

Brezina V. (2017) #LancsBox: A new-generation corpus analysis tool

Brezina, V. (2018) Corpus Approaches to Discourse: A Critical Review

Gablasova D (2022) The Trinity Lancaster Corpus Development, description and application in International Journal of Learner Corpus Research

McEnery T (2019) Corpus Linguistics, Learner Corpora, and SLA: Employing Technology to Analyze Language Use in Annual Review of Applied Linguistics

McEnery Tony (2022) Fundamental Principles of Corpus Linguistics

Reichelt, S. (2017) The British National Corpus as a sociolinguistic dataset: Exploring individual and social variation

Reichelt, S. (2017) Adapting the BNC for sociolinguistic research - a case study on negative concord

Key Findings
Impact Summary
Research Databases and Models
Software and Technical Products
Engagement Activities


Description	The key findings for this project can be categorised in four core areas: 1) Significant new knowledge. The research brought significant new understanding of the evolution of spoken British English over the period of twenty years based on a large-scale analysis of two corpora of spoken British English, the Spoken British National Corpus 1994 and the Spoken British National Corpus 2014 (see collected volume Brezina et al. 2018). Specifically, a. It contributed to sociolinguistic theory, by describing the role of age as one of the major social factors in linguistic shift. Findings of this project highlighted the role of speakers' age in both the development of language across their lifetime and in the evolution of the language as a whole. b. It contributed to a new, sociolinguistically-informed and empirically-based description of linguistic (lexical, grammatical and pragmatic) changes in spoken English over the period of twenty years. We focused on a wide range of linguistic variables from low level processes (e.g. employment of intensifiers) to higher level social interaction (e.g. use of politeness markers). c. It enhanced the knowledge underlying corpus building as it contributed to the understanding of significance of social variables in language use and the need to represent them in corpus creation. We proposed a new sociolinguistically informed corpus sampling procedure with age being the crucial structuring variable. 2) Important new questions. The key findings contributed to opening new areas of research across several disciplines. First, the contribution of sociolinguistic theory to corpus-based studies - here, for example, communities of practice as an explanatory principle was introduced; Second, this also resulted in a call for an examination of the role of social factors in corpus-based studies of second language acquisition and use, something previously not taken into consideration. Important conceptual issues related to interdisciplinary research in second language acquisition are highlighted in the lead article in a forthcoming issue of the Annual Review of Applied Linguistics (McEnery et al. 2019). 3) Innovative methods, tools and techniques. The key theoretical findings and the new research questions drew upon methodological innovation that emerged from the project as part of a close cooperation between sociolinguistics and corpus linguistics (see comprehensive review of statistical methods in Brezina 2018). These improved methods have been implemented in three tools (freely accessible online). BNClab, an online tool allowing powerful sociolinguistic searches, #LancsBox v4, a software package analysing users' own data and comparing it with existing datasets, and Lancaster Stasts Tools online, an easy-to-use online statistical environment for statistical analysis and data visualisation of linguistic data. 4) Increased research capability through a new dataset For comparison purposes a new corpus, Spoken British National Corpus 2014 was created, which consist of 10 million words (downloadable for free as XML files). A balanced subset of the Spoken British National Corpus 2014 and the original British National Corpus was extracted for direct comparison and made available via BNClab.
Exploitation Route	There are two major routes by which the outcomes have been and can be further taken forward by different groups of beneficiaries: 1) Use of the outcomes to generate new scholarly knowledge. The resources generated in the project (i.e. the corpora and the tools for their analysis) provide ideal research instruments for pursuing a wide variety of research questions by scholars in the fields of sociolinguistics, applied linguistics, and first and second language acquisition. The specialist training offered through a number of channels (e.g. workshops, summer schools, MOOC) has provided academics from these communities with skills necessary to use these tools. The resources have been already used in proposing new research activities (e.g. grant proposals submitted by the ESRC Centre for Corpus Approaches to Social Science). We are also preparing further work with the British Library directly informed by the outcomes of this project. 2) Improvement of education in the UK at primary and secondary levels. The outcomes of the project (the interactive platform for language analysis and teaching resources) can be integrated by A-level English Language teachers as well as ELT teachers to enhance their lessons. In addition to using the teaching materials and lesson plans, teachers can also develop their own materials and use the platform flexibly in response to their teaching aims and the needs of their students. The outcomes of the project have a strong potential to inform education policy in the area of language teaching.
Sectors	Education Culture Heritage Museums and Collections


Description	There are two major areas in which the research findings of the project have been used. 1) Development of BNClab and the Corpus for Schools project. Results of the research have been used to develop an online interactive platform, BNClab (http://corpora.lancs.ac.uk/bnclab/ ), giving users access to two key corpora of British English - the British National Corpus (ESRC dataset) and the Spoken British National Corpus 2014, a newly created counterpart to the British National Corpus. The platform has had 4,545 page visits (by 1,584 users from 81 countries worldwide) to date since being made accessible to the public in September 2018. The results of the research were further used to create teaching materials both for the use in A-level English Language classes as well as for English language teaching (ELT). The website 'Corpus for Schools' (http://wp.lancs.ac.uk/corpusforschools/) has been created to support the uptake of the materials by secondary schools teachers as well as ELT teachers. To date, we have organised seven engagement activities with secondary school teachers, ELT teachers, students and the general public with the total number of over 250 of face-to-face participants. For example, we organised a training event for A-level students and A-level teachers with over 60 participants and we also trained the participants of three Lancaster Summer schools in corpus linguistics (110 participants in 2018) in the use of the tool and discussed the main results in the context of the summer schools. In addition, the learning environment (the BNClab platform and teaching materials) have beenintegrated into two units of a highly successful MOOC in Corpus Linguistics with over 6,000 participants in September - November 2018. In the MOOC, the participants are both trained in the use of the resource as well as made aware of the key findings of the project. Finally, the findings have been used to inform two new ESRC grant bids; one of the grants has already been awarded (£750,905, PI: E. Semino), the decision for the second one is pending (£4.256M; PI: E. Semino). 2) Development and use of #LancsBox The project also allowed creation and further development of #LancsBox v. 4, a flexible desktop tool for the analysis of linguistic data. The success of #LancsBox lies in the fact that it allows users to upload and analyse their own data and compare them to the data provided, including the spoken subset of the British National Corpus. To date, #LancsBox has attracted 18,080 users from 137 countries worldwide. It is available for free download for all major operating systems from http://corpora.lancs.ac.uk/lancsbox/
First Year Of Impact	2017
Sector	Education
Impact Types	Cultural Societal


Title	BNC2014 Baby+
Description	This is a 5-million-word sample of current British English ranging from academic writing to fiction, speech and online language. It offers a unique insight into current language use across different genres of British English.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	The dataset was presented at the ICAME40 (Switzerland) and CL2019 (Cardiff, UK) conferences. Further impact activities are planned.
URL	http://corpora.lancs.ac.uk/lancsbox/download.php


Title	Spoken BNC2014
Description	The Spoken BNC2014 is an 11 million word collection of modern British English conversations, transcribed and annotated, for linguistic analysis. It was developed by CASS in collaboration with Cambridge University Press and first released online in 2017. It is accessible at zero cost to anyone, subject to the terms of an end user licence that permits any noncommercial use in research and teaching (but, for reasons of IP, not redistribution).
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes
Impact	So far, (as of September 2017 when the corpus was first released), one journal special issue and one edited volume have been compiled containing research under taken using a pre-release subset of the corpus. Both are due for publication by Q1 2018. Other impacts will follow now that the corpus has been made publicly available.
URL	http://corpora.lancs.ac.uk/bnc2014


Title	The Written British National Corpus 2014
Description	This is a major dataset for current British English including 90 million words across different genres such as fiction, academic writing, newspapers, elanguage etc. It complements the Spoken British National Corpus 2014 (published in 2017).
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	The Written National Corpus 2014 was launched in November 2021; The public engagement event attracted over 1,500 attendees both academic and non-academic.
URL	http://corpora.lancs.ac.uk/lancsbox/download.php


Title	#LancsBox X
Description	#LancsBox has become a major research tool in the field of corpus linguistics with version X attraction already 7,034 new users since its release in November 2021. Version X was specifically designed to handle large amounts of data such as the British National Corpus 2014. The overall number of #LancsBox users is 72,290 up to date
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	We have evidence about #LancsBox being used both in published work (over 600 citations) as well as students dissertations and theses.


Title	#LancsBox X
Description	This is a powerful tool for the analysis of large amounts of linguistic data - billions of words. It newly includes features such as graphical analysis of collocations, wordlist and keyword lists.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	#LancsBox X has an increasing userbase with thousands of new licenses added since its release. It allows users carrying out research that is otherwise not possible.
URL	https://lancsbox.lancs.ac.uk/


Title	#LancsBox v. 3
Description	A new generation corpus analysis tool and data visualisation tool.
Type Of Technology	Software
Year Produced	2017
Impact	This tool has been introduced to a large number of researchers and students via Corpus linguistics MOOC.
URL	http://corpora.lancs.ac.uk/lancsbox/


Title	#LancsBox v. 4.5
Description	This powerful tool can analyse and visualise large amounts of linguistic data. Version 4.7, which is freely distributed for all operating systems includes a number of innovations, which allow sophisticated data processing and statistical analysis directly within the tool.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The tool has so far been used by over 30,000 unique users with over 60,000 licenses distributed up to the present day (27/02/2020). The tool has been used in a large number of publications (over 500 citations), which shows practical impact of the tool on the field of corpus linguistics.
URL	http://corpora.lancs.ac.uk/lancsbox


Title	#LancsBox v. 5
Description	#LancsBox is a powerful language analysis tool, which visualise data and produces statistics. In version 5, a new Wizard tool was added to the package, which allows automatic generation of research reports based on raw data. This is a unique feature not available in any other tool.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	#LancsBox has become a major research tool in the field of corpus linguistics with version 5 attraction already 14,879 new users since its release in June 2020. The overall number of #LancsBox users is 49,727 up to date. We have evidence about #LancsBox being used both in published work (over 300 citations) as well as students dissertations and theses. A case study of this use is available in the following blogpost http://cass.lancs.ac.uk/lancsbox-the-emerging-historical-linguists-mo-a-brief-case-study-of-aramaic/.
URL	http://corpora.lancs.ac.uk/lancsbox


Title	BNClab
Description	The web-based software allows efficient analysis and visualisation of sociolinguistic data; it analyses data according to gender, age, social class, region as well as individual speaker performance. It also compares language development over the period of 20 years from 1994 to 2014. The software employs complex multi-variate statistical analysis to test different sociolinguistic hypotheses about the dataset.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	The impact activities for this software are planed for the period of May - July 2018.
URL	http://corpora.lancs.ac.uk/bnclab/search


Title	Lancaster Stats Tools online
Description	Lancaster Stats Tools online offers access to powerful statistical tools through a simple 'click and analyse' user interface, into which the data can be directly copy/pasted from a spreadsheet (e.g. Excel or Calc). The statistical tools offer the power of the R package in the background combined with a user-friendly interface designed specifically for analyses of data in corpus linguistics. To search corpora and obtain frequencies for statistical analysis a range of software tools can be used.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	The software tool brings innovation to corpus linguistics. It offers a comprehensive overview of methods that can be used to analyse linguistic data. It is based on extensive research that was enabled by the ERSC-funded project.


Title	TLCHub
Description	This online tool was designed on the basis of BNClab (a previous entry) to search, analyse and visualize the Trinity Lancaster Corpus, the largest spoken learner corpus of its kind.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The tool was only recently released; it has been used to search for evidence to validate new language tests by Trinity College London, a major international testing board. Further impact activities (use in teacher and examiner training) are being prepared.
URL	http://corpora.lancs.ac.uk/trinity


Description	#LancsBox X - release
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Almost 1,300 researchers (early career and senior), educators and other stakeholders attended this event, which provided information about the new software tool we released.
Year(s) Of Engagement Activity	2023
URL	https://cass.lancs.ac.uk/lancsbox-x-innovation-in-corpus-linguistics/


Description	#LancsBox a new tool for researches, teachers and students
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This workshop introduced a new analytical tool #LancsBox that can be used for both research and teaching purposes.
Year(s) Of Engagement Activity	2017


Description	Cambridge ELT blogpost: Stories behind pronouns: evidence from real spoken British English
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This interactive blogpost on http://www.cambridge.org/elt/ website and twitter coverage targeted a large number of practitioners, textbook writers and policy makers (over 61,200 twitter followers). It brought some highlights of the project to one of the target audiences (beneficiaries). As a result, there was an increased uptake in the use of the teaching materials based on the project. Further impact activities are planned with Camabridge University Press and Cambridge Language Testing.
Year(s) Of Engagement Activity	2019
URL	http://www.cambridge.org/elt/blog/2019/01/04/pronouns-spoken-english/?utm_source=twitter&utm_medium=...


Description	Corpus Linguistics Training Workshop at Chulalongkorn University in Bangkok, Thailand
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	In September 2018, a team from CASS (Andrew Hardie, Robbie Love, and Susan Reichelt) visited Chulalongkorn University in Bangkok, Thailand (the travel costs, but not the staff time, for this visit was supported by a grant from the Newton Fund, listed under "Further Funding") to undertake a training workshop on the use of National Corpus data (linking to the host institution's "Thai National Corpus" project and the CASS "BNC2014" and "BNC as a Sociolinguistic Dataset" projects). Over 2 days this workshop included practical sessions using corpus software (CQPweb, BNClab) as well as presentations on the outcomes of research by the CASS staff and our Thai collaborators Raksangob Wijitsopon and Pornthip Supanfai, as well as other Thai researchers working on Corpus data at Chulalongkorn. The audience reached was approximately 12-15, consisting of a mixture of academics and postgraduate students. The event was successful and led to extended discussions in January 2019 between Hardie and Wijitsopon about future collaborations including plans to seek further external funding.
Year(s) Of Engagement Activity	2018


Description	Corpus MOOC - new #LancsBox training
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Development of new training videos and teaching activities for a massive open online course (MOOC) in corpus linguistics.
Year(s) Of Engagement Activity	2017


Description	Corpus MOOC 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Corpus MOOC is an eight-week online training programme in corpus linguistics, allowing the participants to benefit from the results of ESRC-funded corpus research at Lancaster University. In 2020, over 6,000 participants registered for the course; overall, the course attracted 63,482 participants over eight iterations, some of which are included in the previous submissions.
Year(s) Of Engagement Activity	2020
URL	https://www.futurelearn.com/admin/courses/corpus-linguistics


Description	Corpus MOOC 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Corpus MOOC is an eight-week online training programme in corpus linguistics, allowing the participants to benefit from the results of ESRC-funded corpus research at Lancaster University. In 2021, over 6,000 participants registered for the course; overall, the course attracted 69,762 participants over nine iterations, some of which are included in the previous submissions.
Year(s) Of Engagement Activity	2021
URL	https://www.futurelearn.com/admin/courses/corpus-linguistics


Description	Corpus MOOC 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Over 2,500 students were trained in corpus linguistic methods via an 8-week course with individual support.
Year(s) Of Engagement Activity	2022
URL	https://www.futurelearn.com/admin/courses/corpus-linguistics


Description	Corpus MOOC-2019
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Over 5,000 participants took in the 8-week online training programme in corpus linguistics. They learned information based on the project as well as transferable skills to apply the techniques used in the project in their own research and professional contexts. A number of participants reported successful application of the techniques and asked follow-up questions.
Year(s) Of Engagement Activity	2019
URL	https://www.futurelearn.com/courses/corpus-linguistics


Description	Corpus linguistics and sociolinguistics - public engagement event
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	This event oriented to the general public was organised as part of the Lancaster University's Campus in the City series. The university hired a shop in Lancaster city centre where we introduced the interested public (teachers, school children and their parents, people from local businesses etc.) to the tools developed as part of the project. Participants could search any word or phrase of interest in BNClab (http://corpora.lancs.ac.uk/bnclab/search). More than fifty people attended the event, which sparked many interesting discussion about language and society in Britain. Primary and secondary school students were exposed to both the process and the product of academic research, showing them possibilities of carriers in cutting-edge computational research.
Year(s) Of Engagement Activity	2019
URL	http://cass.lancs.ac.uk/cass-in-the-city/


Description	Corpus- MOOC - new units on sociolinguistics and language learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	We created two brand new units featuring the results of the ESRC-funded project. The international audience (over 5000 participants from almost 100 countries in the 2018 run) allowed wide dissemination of the research findings. The corpus MOOC was also instrumental in helping the University of Mosul to rebuild their language studies programme.
Year(s) Of Engagement Activity	2018
URL	https://esrc.ukri.org/news-events-and-publications/news/news-items/esrc-centre-helps-mosul-universit...


Description	Lancaster Summer Schools
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This large international event took place in June 2018 at Lancaster University. During the four-day intensive training the participants learnt to use new software tools designed as part of the ESRC-funded project. There was a significant increase in the use of the new tools. Lancaster stats Tools online (242 28-Day Active Users), #LancsBox (505 28-Day Active Users).
Year(s) Of Engagement Activity	2018
URL	http://wp.lancs.ac.uk/corpussummerschools/


Description	Lancaster Summer Schools 2019
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This face-to-face training event was attended by almost 150 participants from over 30 different countries worldwide. It involved dissemination of the research outcomes as well as training in research skills applicable in individual research contexts. Since the primary audience of this workshop were postgraduate students (PhD candidates) the training did contribute to successful completion of their doctoral studies.
Year(s) Of Engagement Activity	2019
URL	http://wp.lancs.ac.uk/corpussummerschools/


Description	Lancaster Summer Schools in Corpus linguistics 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Lancaster summer schools in corpus linguistics represent a major annual training event with the most recent results of the project being reported at the event. The training allows participants to apply the methods discussed in their own research contexts. In 2020 the summer schools were online, allowing even a large number of participants to take part.
Year(s) Of Engagement Activity	2020
URL	http://wp.lancs.ac.uk/corpussummerschools/


Description	Lancaster Summer Schools in Corpus linguistics 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Lancaster summer schools in corpus linguistics represent a major annual training event with the most recent results of the project being reported at the event. The training allows participants to apply the methods discussed in their own research contexts. In 2021 the summer schools were online, allowing even a large number of participants to take part.
Year(s) Of Engagement Activity	2021
URL	http://wp.lancs.ac.uk/corpussummerschools/


Description	Lancaster Summer Schools in Corpus linguistics 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This online event attracted over 100 students who were trained in corpus linguistics methods, discourse analysis and statistics.
Year(s) Of Engagement Activity	2022
URL	http://wp.lancs.ac.uk/corpussummerschools/


Description	Media interviews, articles and coms activities - Written BNC2014
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	·45 UK media outlets (newspaper and radio), including: Daily Telegraph, The Sunday Daily Telegraph, The Mirror, The Times, The Guardian, The Daily Mail (online), BBC 4 Radio station, regional BBC stations (Coventry, Highlands and Islands, Scotland, Solent, Somerset), The Conversation. ·11 international media outlets (Australia, Canada, Czech Republic, Ireland, Philippines, Poland, South Africa, US) ·Total monthly reach of these outlets (estimated visitors): 937,091,880 (based on the Lancaster University Press Office audience figures)
Year(s) Of Engagement Activity	2021,2022


Description	NATE conference: Corpus for schools: Using corpus resources in A level English Language classes
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	This event took place at the National Association of Teachers of English conference in Birmingham, 23nd June 2018. Head teachers across the country for attended this event. The event sparked a vivid debate and wide interest in the newly developed tool (BNClab). In the following months several dozen requests for teaching materials available for free at the BNClab platform followed.
Year(s) Of Engagement Activity	2018
URL	http://wp.lancs.ac.uk/corpusforschools/2018/09/06/bnclab-at-the-nate-conference-in-birmingham/


Description	School visit - Corpus linguistics: Scientific approach to language.
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	This half-day workshop for A-level English Language students at Ripley St Thomas secondary school took place in Lancaster on 16th July, 2018. The workshop was jointly led by Dr. Dana Gablasova and Dr. Vaclav Brezina. Students learnt how to use a new online tool BNClab that was created as part of this ESRC-funded project. The workshop was well received and stimulated discussion and follow up conversations. Early feedback on the tool was provided.
Year(s) Of Engagement Activity	2018


Description	Using corpora to explore the English language
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	Over 50 secondary-school students and 5 English Language teachers attended this A-level workshop at the A-level conference. The event took place at Lancaster University on 3rd July 2018. The participants were introduced into using corpora and corpus techniques. Separate instructions were provided to teachers (lecture led by Dr. Dana Gablasova) and students (practical session led by Dr. Vaclav Brezina). The focus of the event was to empower teachers and students to use software tools that were developed as part of the ESRC-funded project. After the event, there was an increased uptake in the use of the #LancsBox (505 28-day active users in July 2018).
Year(s) Of Engagement Activity	2018


Description	Written BNC2014 - launch event
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The main purpose of this activity was to raise awareness about the new data set (Written BNC2014), its general accessibility via #LancsBox X and early research carried out on current British English. The event was live streamed from Lancaster Castle and allowed both in-person and online participation. Over 1,500 participants joined the event.
Year(s) Of Engagement Activity	2021
URL	http://cass.lancs.ac.uk/celebrating-the-written-bnc2014-lancaster-castle-event/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications