FreeTxt: supporting bilingual free-text survey and questionnaire data analysis
Lead Research Organisation:
CARDIFF UNIVERSITY
Department Name: Sch of English Communication and Philos
Abstract
In a modern consumer-led culture, obtaining and responding to qualitative feedback (i.e. often free-text comments/written feedback) is embedded in the professional practice of many walks of life.
Surveys are used, for example, in staff development, professional training, product design and testing, and in various forms of service provision across the public and private sector. Surveys and questionnaires often produce a combination of quantitative and qualitative forms of data. Quantitative forms, such as rating scales (e.g. likert scale responses), multiple choice questions and rank order questions can be numerated (i.e. quantified) with ease, the analysis of which can be conducted in a systematic and often automated way. By contrast, more qualitative questions, which prompt open ended, free-text comment responses, or, in the context of the tourism and heritage sector, written feedback from exhibitions, events and/or historical sites on social media channels or websites including Trip Advisor and Trust Pilot, pose a more difficult challenge for the analyst. Tackling written, text-based feedback often requires a more labour-intensive and manual approach to analysis. Compounding this challenge is where feedback is presented in both English and Welsh, as is often the case in Wales, with Wales representing the largest bilingual community in the UK. The successful analysis of bilingual data relies on the workforce having the appropriate linguistic expertise to process it.
While a range of sophisticated digital tools for the analysis of text-based data are available, particularly for researchers working in academia, in marketing and public relations contexts etc., many of the digital resources used are not necessarily affordable, quick and easy to use, and/or accessible to non-expert users. Specifically, these tools currently do not fully support the task of systematically processing free-text responses in Welsh.
This project aims to bridge this gap by building the novel 'FreeTxt' toolkit which is designed to support the analysis and visualisation of multiple forms of open-ended, free-text data in both English and Welsh. FreeTxt will draw on existing open-source bilingual corpus-based utilities and methodologies, repackaging these and taking them in a new direction so that they are relevant to new audiences/user-groups. We will work closely with project partners Cadw and National Trust Wales to co-design, co-construct and test FreeTxt to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.
Existing tools that we will draw on include those developed as part of the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh). This includes CorCenCC's semantic (i.e. meaning based categorisations of individual words and phrases) and part of speech (POS - i.e. grammar-based categorisations of individual words and phrases - e.g. nouns, verbs) taggers and tagsets for Welsh language, and corpus functionalities for the querying of language, amongst others. These tools will be integrated into a user-friendly, online interface that users can paste/upload their texts into, to search for patterns of meaning that emerge in survey responses and feedback; to see which words are most often used in relation to a given theme, place, topic; to understand what visitors particularly enjoyed about a service or attraction, and what they think could be improved.
The final version of the tool will be made freely-available and will be adaptable in terms of who can use it and when. It will contain generic analysis features that enable it to be used by any public and/or professional company and institution dealing with varying datasets of qualitative survey data and will be of relevance to academic researchers analysing and visualising survey data. The accessibility and usability of this tool will help provide a direct route to potential impact.
Surveys are used, for example, in staff development, professional training, product design and testing, and in various forms of service provision across the public and private sector. Surveys and questionnaires often produce a combination of quantitative and qualitative forms of data. Quantitative forms, such as rating scales (e.g. likert scale responses), multiple choice questions and rank order questions can be numerated (i.e. quantified) with ease, the analysis of which can be conducted in a systematic and often automated way. By contrast, more qualitative questions, which prompt open ended, free-text comment responses, or, in the context of the tourism and heritage sector, written feedback from exhibitions, events and/or historical sites on social media channels or websites including Trip Advisor and Trust Pilot, pose a more difficult challenge for the analyst. Tackling written, text-based feedback often requires a more labour-intensive and manual approach to analysis. Compounding this challenge is where feedback is presented in both English and Welsh, as is often the case in Wales, with Wales representing the largest bilingual community in the UK. The successful analysis of bilingual data relies on the workforce having the appropriate linguistic expertise to process it.
While a range of sophisticated digital tools for the analysis of text-based data are available, particularly for researchers working in academia, in marketing and public relations contexts etc., many of the digital resources used are not necessarily affordable, quick and easy to use, and/or accessible to non-expert users. Specifically, these tools currently do not fully support the task of systematically processing free-text responses in Welsh.
This project aims to bridge this gap by building the novel 'FreeTxt' toolkit which is designed to support the analysis and visualisation of multiple forms of open-ended, free-text data in both English and Welsh. FreeTxt will draw on existing open-source bilingual corpus-based utilities and methodologies, repackaging these and taking them in a new direction so that they are relevant to new audiences/user-groups. We will work closely with project partners Cadw and National Trust Wales to co-design, co-construct and test FreeTxt to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.
Existing tools that we will draw on include those developed as part of the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh). This includes CorCenCC's semantic (i.e. meaning based categorisations of individual words and phrases) and part of speech (POS - i.e. grammar-based categorisations of individual words and phrases - e.g. nouns, verbs) taggers and tagsets for Welsh language, and corpus functionalities for the querying of language, amongst others. These tools will be integrated into a user-friendly, online interface that users can paste/upload their texts into, to search for patterns of meaning that emerge in survey responses and feedback; to see which words are most often used in relation to a given theme, place, topic; to understand what visitors particularly enjoyed about a service or attraction, and what they think could be improved.
The final version of the tool will be made freely-available and will be adaptable in terms of who can use it and when. It will contain generic analysis features that enable it to be used by any public and/or professional company and institution dealing with varying datasets of qualitative survey data and will be of relevance to academic researchers analysing and visualising survey data. The accessibility and usability of this tool will help provide a direct route to potential impact.
Description | The FreeTxt toolkit has now been developed and launched to the general public. FreeTxt is a free bilingual online toolkit for analysing and visualising free-text data (from surveys, questionnaires etc.) in English and Welsh. FreeTxt draws on some of the corpus-based utilities and methodologies from CorCenCC and ACC (Welsh Automatic Text Summarisation), repackaging these to enable new audiences and user-groups to analyse their own feedback data. Co-designed in collaboration with National Trust Wales, Museum Wales, Cadw, WJEC, and National Centre for Learning Welsh, FreeTxt is accessible to anyone in any sector in Wales and beyond. FreeTxt: • indicates if your data is positive and/or negative (sentiment analysis) and provides downloadable visualisations of results. • allows you to explore/visualise common words, phrases and themes in your data (in tables, word clouds etc.). • enables you to summarise free-text data, and examine word use and relationships. FreeTxt is available open source with an Apache 2.0 licence (https://github.com/UCREL/freetxt/), and via a hosted web demo interface at: www.freetxt.app. It incorporates other open source tools from our previous projects such as CyTag (Welsh POS tagger), a Welsh summariser, and PyMUSAS (for English and Welsh), see https://www.freetxt.app/about for more details. |
Exploitation Route | The FreeTxt toolkit can be used directly on the website above, and/or the code can be accessed and further developed via the links provided on the following GitHub site: https://github.com/UCREL/freetxt/ FreeTxt is also linked to via the new Welsh Government funded www.digigrid.cymru initiative, which is likely to increase the number of users who engage with the tool. DigiGrid (GDC-WDG) is an online collection of freely available digital resources designed to support the exploration, analysis, learning, and referencing of the Welsh language. |
Sectors | Creative Economy Digital/Communication/Information Technologies (including Software) Leisure Activities including Sports Recreation and Tourism Culture Heritage Museums and Collections Other |
URL | http://www.freetxt.app |
Description | The FreeTxt tool was launched in 2023 in a meeting to project partners in the first instance. These are: National Trust Wales, Cadw, Museum Wales, WJEC/CBAC, National Centre for Learning Welsh (NCLW), all of whom have since been using the tool to analyse their own free-text comments from online fora, surveys and so on, and have included visualisations from the tool into presentations and updates they provide internally. |
First Year Of Impact | 2023 |
Sector | Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Culture, Heritage, Museums and Collections |
Impact Types | Cultural Societal Policy & public services |
Description | Reference to CorCenCC in the Welsh Linguistic Infrastructure Policy document |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
Impact | Importance of the CorCenCC and FreeTxt resources mentioned in the policy. |
URL | https://www.gov.wales/welsh-linguistic-infrastructure-policy-html |
Description | Welsh Digital Grid (www.digigrid.cymru) |
Amount | £15,000 (GBP) |
Organisation | Government of Wales |
Sector | Public |
Country | United Kingdom |
Start | 03/2023 |
End | 03/2024 |
Title | FreeTxt |
Description | FreeTxt is a free bilingual online toolkit for analysing and visualising free-text data (from surveys, questionnaires etc.) in English and Welsh. FreeTxt draws on some of the corpus-based utilities and methodologies from CorCenCC and ACC (Welsh Automatic Text Summarisation), repackaging these to enable new audiences and user-groups to analyse their own feedback data. Co-designed in collaboration with National Trust Wales, Museum Wales, Cadw, WJEC, and National Centre for Learning Welsh, FreeTxt is accessible to anyone in any sector in Wales and beyond. FreeTxt: • indicates if your data is positive and/or negative (sentiment analysis) and provides downloadable visualisations of results. • allows you to explore/visualise common words, phrases and themes in your data (in tables, word clouds etc.). • enables you to summarise free-text data, and examine word use and relationships. FreeTxt is available open source with an Apache 2.0 licence (https://github.com/UCREL/freetxt/), and via a hosted web demo interface at: www.freetxt.app. It incorporates other open source tools from our previous projects such as CyTag (Welsh POS tagger), a Welsh summariser, and PyMUSAS (for English and Welsh), see https://www.freetxt.app/about for more details. |
Type Of Material | Data analysis technique |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | FreeTxt is already being used by project partners Cadw, National Trust Wales, Museum Wales, National Centre for Learning Welsh and WJEC to transform how they process/analyse qualitative data. |
URL | https://www.freetxt.app/ |
Description | WJEC|CBAC |
Organisation | Welsh Joint Education Committee |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | WJEC|CBAC have become official partners on the project. They have been helping with the user-testing of the software and are potential future users of the tool. |
Collaborator Contribution | WJEC|CBAC have become official partners on the project. They have been helping with the user-testing of the software and are potential future users of the tool. |
Impact | No outputs as yet (the collaboration has only just begun). |
Start Year | 2022 |
Title | FreeTxt - |
Description | This is the first release of the FreeTxt tool - released to enable the project partners to test (and provide feedback) on the tool. The final version will be freely/publicly available. |
Type Of Technology | Webtool/Application |
Year Produced | 2022 |
Impact | The project partners have been testing their own data on this tool - Cadw and National Library Wales have already used some outputs and visualisations in reports and presentations. |
URL | https://ucrel-welsh-freetxt-app-home-6pshxm.streamlit.app/ |
Description | A press release to announce the launch of the FreeTxt online toolkit [Lancaster University] |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Media release to announce the launch of the freely available FreeTxt online toolkit |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.lancaster.ac.uk/news/welsh-speakers-to-have-a-greater-say-thanks-to-launch-of-free-onlin... |
Description | A press release, press conference or response to a media enquiry/interview - FreeTxt project launch |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Initial project press release regarding the FreeTxt project funding |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.cardiff.ac.uk/news/view/2678224-language-tools-with-real-world-impact |
Description | CorCenCC and FreeTxt demonstration for teachers/educators at the National Centre for Learning Welsh |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | 15 educators attended a demonstration of the new tools in the relaunched version of CorCenCC. This included representatives from the National Centre for Learning Welsh from all around Wales. The event increased the number of users of the tool. |
Year(s) Of Engagement Activity | 2023 |
Description | FreeTxt article including in the BAAL newsletter autumn 2023 |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | An article regarding the key functionalities of FreeTxt was included in the British Association for Applied Linguistics (BAAL) autumn newsletter. |
Year(s) Of Engagement Activity | 2023 |
URL | http://www.baal.org.uk |
Description | Invited short talk/demo of FreeTxt given at the Wales Tourism Partnership meeting |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Third sector organisations |
Results and Impact | Invited short talk/demo of FreeTxt given to the Wales Tourism Partnership meeting. There were 38 people in attendance from a range of different public sector institutions, local government and so on. |
Year(s) Of Engagement Activity | 2023 |
Description | Press release for the launch of the FreeTxt project [Cardiff University] |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Media release to announce the launch of the freely available FreeTxt online toolkit |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.cardiff.ac.uk/news/view/2782910-welsh-speakers-to-have-a-greater-say-thanks-to-launch-of... |