Strategic Priorities Fund - AI for Science, Engineering, Health and Government

Lead Research Organisation: The Alan Turing Institute
Department Name: Research

Abstract

please see attached business case.

Planned Impact

Please see attached business case

Publications

10 25 50
 
Title Reproducible secure research environments: Talk from Safe Data Access Professionals Quarterly Meeting on 08 June 2021 
Description Overview of the challenges of supporting reproducible research on sensitive data and how the Turing addresses these in its Safe Haven secure research environment. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
URL https://figshare.com/articles/presentation/Reproducible_secure_research_environments_Talk_from_Safe_...
 
Title Reproducible secure research environments: Talk from Safe Data Access Professionals Quarterly Meeting on 08 June 2021 
Description Overview of the challenges of supporting reproducible research on sensitive data and how the Turing addresses these in its Safe Haven secure research environment. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
URL https://figshare.com/articles/presentation/Reproducible_secure_research_environments_Talk_from_Safe_...
 
Title The Turing Way: Reflection on the community-led development since 2019 
Description Lightning talk presentation by Esther Plomp on The Turing Way on the 5th of April 2022 for the Collaboration Workshop (https://software.ac.uk/cw22) 
Type Of Art Film/Video/Animation 
Year Produced 2022 
URL https://ssi-cw.figshare.com/articles/presentation/The_Turing_Way_Reflection_on_the_community-led_dev...
 
Title The Turing Way: Reflection on the community-led development since 2019 
Description Lightning talk presentation by Esther Plomp on The Turing Way on the 5th of April 2022 for the Collaboration Workshop (https://software.ac.uk/cw22) 
Type Of Art Film/Video/Animation 
Year Produced 2022 
URL https://ssi-cw.figshare.com/articles/presentation/The_Turing_Way_Reflection_on_the_community-led_dev...
 
Description There are 6 core themes and 3 cross theme projects funded under the AI for Science and Government (ASG) programme, key findings for each are outlined below.
In summary the main outputs have been the piloting and deployment of multidisciplinary collaborative research to address challenges identified by end users. The implementation of best practices and standard for research (RRI) within target sectors and research communities. The establishment of long-term partnerships to ensure data science and AI solutions to complex societal challenges are addressed with multidisciplinary research.
An overarching achievement is that research activities have catalysed over £115m in additional funding.
Digital Twins: Urban Analytics: The Rapid Assistance in Modelling the Pandemic (RAMP) model that was created to support the pandemic modelling community working on Coronavirus was disaggregated into modules that offer a solid basis for other projects ranging from the study of spatial inequality and energy distributions to infrastructure investment and sustainable cities. Specifically the integration of QUANT and SPENSER into a product (decision-support tool for non-pharmaceutical interventions) called The Dynamic Microsimulations for Epidemics (DyME) - initially for the county of Devon, was scaled nationally, with the inclusion of new activities and extended demographically (especially in relation to health) - necessary for analytics and scenario-planning related to covid. This technology is available through an open-source platform as the Synthetic Population Catalyst (SPC) which covers all counties in England.
Further to this and in collaboration with researchers in the environment theme the team created the Dynamic Microsimulations for Environments - Climate, Heat and Health (DyME-CHH) project as an instance of the dynamic modelling platform with a focus on the future impacts of climate change (urban heating) on population health. This work has engaged with local councils in Cornwall, Bristol and Bradford, and with the Met Office. The model uses synthetic populations, provided by the Synthetic Population Catalyst project.
Worked under this theme has also led to the development of a specific Digital Twin for Urban Transport (DT-UT) with applications in five cities, national coverage possible depending on DfT support and strategy.
Digital Twins: Complex Systems Engineering: Within the built environment group, a digital twin for the world's first underground farm has been developed. The farm's digital twin (dubbed 'CROP') combines physical modelling with real-time sensor data, to automatically calibrate CROP's physics-based model of conditions within the underground farm, enabling farm managers to assess and optimise conditions on the farm, track and predict how different conditions affect yield volumes.
The aerospace group has engaged in a digital twinning activity with important applications in manufacturing. Through a digital twin of the intake of a jet-engine, in collaboration with Rolls Royce, they developed methods which would permit cheaper and more effective blade designs for aeroengines, using digital twins to identify improved manufacturing error margins.
Building on existing Turing funded work, in collaboration with MX3D on the world's first 3D printed bridge, the construction and built environment group has leveraged novel approaches to combining physical simulations and statistical data to develop a digital twin for asset monitoring. This has subsequently been applied to the monitoring of instrumented railway bridges - applied as a pilot to an instrumented railway bridge in Staffordshire, in collaboration with Microsoft.
Working with several industry partners, the oil and gas and fast-moving consumer goods group has developed digital twin prototypes to support process optimisation within these sectors. Building off the success of this, the group has spun (jointly with UCL) out Quaisr, which provides a unique digital twin platform, enabling enterprises to scale up their internal models and operationalise them as digital twins.
Health - Through a collaboration with Francis Crick Institute the team introduced Biomedical Science Data Awards aimed at early career researchers. The training resources have been developed under The Carpentries incubator programme and shared via the Turing online training platform.
Within a separate work package, our researchers have worked alongside the Office for National Statistics (ONS) to improve the annual England Health Index. Our researchers worked alongside the ONS to evaluate the steps taken in the construction of the Index, reviewing the conceptual coherence and statistical requirements on Health Index data for 2015-2018. The review resulted in the publication of a Health Index audit, Assessments and developments in constructing a National Health Index for policy making, in the United Kingdom. Based on the results of the audit, the researchers have made several recommendations.
Researchers have been working with Public Health Scotland (PHS) to improve its pioneering SPARRA tool for predicting emergency hospital admissions. General Practitioners (GPs) in Scotland have been using SPARRA since 2011 to identify patients at high risk of emergency admission, intervening early to try to reduce this risk by, for example, adjusting medication or making targeted referrals. SPARRA v4 is expected to be deployed in 2023. It will provide GPs with monthly risk scores for around 80% of the Scottish population. SPARRA v4 refines the previous tool (SPARRA v3) by using cutting-edge machine learning techniques to improve the algorithm's risk prediction. For the 10,000 patients deemed most at risk, calculations with historic data show that v4 could have pre-empted around 1,000 additional emergency admissions.
Development of a learning machine for supporting decision-making for clinicians project has shown that healthcare data changes over time, and across locations. These changes can affect the performance of machine learning models, as the models are not trained on the statistical properties in new data. We have convinced clinicians that monitoring these changes as they arrive is important, instead of waiting to see if the performance of the models deteriorate. This is important because for use cases like early dementia prediction, the validation of the model performance is not known until years later when the patient exhibits signs that the disease has entered final stages. The infrastructure for silently monitoring data before they are fed to machine learning models have been adopted by the Memory Clinic at Addenbrookes Hospital and at University College London Hospital.
Researchers on the Building global AI guided digital solutions for brain and mental health project have developed a predictive and prognostic modelling framework based on machine learning algorithms that are trained and validated to reliably predict individual rate of cognitive dysfunction at early stages of neurodegenerative disorders from multimodal baseline data. The Digital Neurodetection tool (NDx) predicts reliably individual progression to Alzheimer's disease from low-cost non-invasive measurements rather than markers of pathology identified by more invasive measures.
Criminal Justice System - The Turing Commons is an online platform to promote and develop data and digital skills and literacy among various groups (e.g., researchers, developers, policymakers, and members of the public). It comprises a set of interactive, educational resources that introduce and explain key topics of data science and AI a) responsible research and innovation (RRI), b) public engagement of data science, and c) AI ethics and governance. The aim of this project is to promote ethical development and applications of AI, particularly in policy making environments.
Under the Forensic Evidence project graph-theoretic and causal mechanisms provide an effective mechanism and framework for the evaluation and interpretation of evidence. Purely technical advances are not sufficient in forensic science. Expert knowledge needs to be incorporated. A culturally aware approach that supports human computer interaction is essential to support forensic science community.
Under the Conceptual review of the assessment of the Self-Harm (CRASH) project the team have developed a causal model of self-harm and suicidal behaviour for people in secure settings such as hospitals or prisons. They will use this to inform the development of a new measure, based on causal modelling approaches and trained and validated with real data from our stakeholder, that they hope will reduce self-harming across their client base.
The project Hate Speech: Measures and Counter-measures led to the creation of new methodologies and data sets for detecting and measuring hate speech and other forms of online harm. The project also led to the creation of an Online Harms Observatory, in collaboration with the Department for Digital, Culture, Media and Sport. The Observatory is a new platform that provides real-time insight into the scope, prevalence, and dynamics of harmful online content. It bridges the gap between methodological robustness and practical relevance by leveraging innovative
Without an increased understanding of the potentially complex interrelationships between supply and demand, optimising police responses will continue to be a challenge. Research conducted under a further project within the criminal justice theme sought to address this by producing new conceptualisations of police supply and demand which are increasingly being adopted within the academic and practice literature, and new applications of individual based simulation methodologies in quantifying and modelling police supply and demand dynamics.
AI for Science - The notion of benchmarking provides an ability to quantify the benefits of different AI/machine learning algorithms across different challenges, and on different systems / architectures. As such, the AI benchmarking project provides an ability to understand the appropriateness of an machine learning technology for a given problem. This notion has an important effect not only on how software and hardware industries work, but also across consumer industries (such as identifying the most suitable AI-driven photo correction algorithm on a mobile phone).
Through further funded projects delivered by STFC research has enabled scientists to predict damages to optical components in a laser facility and to understanding how neutron scatters on specific materials to enabling improved imaging in modern microscopes.
Under the molecular structure from images under physical constraints project research has resulted in three major new contributions to the broader scientific community. There have been significant new national and international collaborations to address the challenge of high-resolution macromolecular structure determination from cryoelectron microscopy data. This includes the development of new algorithms to address the challenge of incorporating prior scientific knowledge into molecular structure determination. In particular this involves incorporating our prior knowledge of protein structure and dynamics. A new partnership has been established with the Rosalind Franklin Institute to continue to drive new developments of the computational methodology and ensure that the latest developments in experimental data collection are utilised.
Through the project that looks to predict Arctic sea ice loss, researchers developed the Ice-net, an open-source AI tool that uses deep learning algorithms to forecast sea ice conditions up to six months ahead based on satellite observations and climate dat. This data-driven method is capable of outperforming traditional process-based model (e.g. climate models) and is quicker to run for real-time decision-making. Working in further collaboration with BAS and STFC researchers have developed new AI methods that can provide robust information on where to place new surface sensors, prediction of wildfires with increase resolution and also we have built a website with tutorials (python notebooks, https://the-environmental-ds-book.netlify.app/welcome.html)
A computer vision tool called Scivision has been developed that can be applied to different use cases. In one application using this tool the team developed an algorithm that can more accurately identify plankton species from what was being used at Centre for Environment, Fisheries and Aquaculture Science (CEFAS). Plant phenotyping are being integrated into the Scivision toolkit for scientific data analysis.
Tools, Practices and System - The Turing Way is an open source and community-led project that aims to make data science comprehensible and useful for everyone. This project brings together its diverse community of researchers, educators, learners, administrators and stakeholders from within The Alan Turing Institute and beyond. The Turing Way provides a web-based book with five guides on reproducible research, project design, communication, collaboration and ethical research, as well as a community handbook. All written in open on GitHub by 300+ community members. The 200 subchapters cover topics including open science, data management, research software, communication methods, remote collaboration, ethics, and human rights.
The Data Safe Haven project created software to deploy and manage a Data Safe Haven and to classify datasets into multiple sensitivity tiers developed by this project across its multiple phases Several other organisations have deployed their own instance of our Data Safe Haven platform to evaluate it for their own use cases, with one of these having adapted it to scale their production secure research analysis environment into the cloud to support their Covid-19 response. Other organisations have incorporated our tiered data classification model into their own data handling approaches.
Exploitation Route There are multiple ways in which the outcomes of this funding will continue to drive future work, and be put to use by others. The fundamental new developments, datasets and tools will be integrated into widely used data capture and analysis pipelines. This will enable other scientists, both nationally and internationally, address their scientific problems and facilitate new discoveries. The outcomes are useful across respective areas to further studying or advancing studies in various scientific domains. Scientists and researchers in partner organisations and beyond can understand the role of AI in Science, and build on those examples. Scientists and researchers can use various software developed under this funding as blueprints for develop additional techniques, Furthermore academics and students can make use of the training materials around machine learning for further understanding or developing teaching materials.
Other research institutes that we have partnered with include British Antarctic Survey, Centre of Ecology and Hydrology, Centre for Environment, Fisheries and Aquaculture, Rosalind Franklin Institute, John Innes Centre, Earlham Institute, Rothamstead Research, MRC Laboratory of Molecular Biology, Met Office, National Oceanography Centre, Science and Technology Facilities Council, Rutherford Appleton Laboratory
The tools, software and methods developed have broad applicability in many sectors outside of academia and have already started to be deployed by government, industry and third sector as can been seen in the narrative impact section.
Some specific examples are provided below.
Work being undertaken by the DyME-CHH team who now plan to use its model to compare the effects of different policy interventions on climate-related health. By identifying the groups most at risk from heat exposure, the model will support local councils in developing effective ways to adapt to extreme heat. Data from the model will also feed into the University of Exeter's Local Climate Adaptation Tool (LCAT), which has been co-designed with local government authorities including Cornwall Council and Manchester City Council. LCAT provides decision makers with a user-friendly interface for exploring the predicted health effects of climate change, and suggests evidence-based recommendations for adaptation.
For the CROP underground farm digital twin, ongoing work seeks to turn this into an open platform which growers in other indoor farming spaces can make use of, for monitoring and yield prediction.
Through government the uses of the urban models that are part of digital twins are being discussed with the Ministry of Housing, Communities and Local Government with a view to being made available nationally. In addition A novel web tool has been developed to assist local authorities in the UK with designing low-traffic neighbourhoods and predicting traffic impacts.
Online Harms Observatory project is working in partnership with the Department for Digital, Culture, Media & Sport (DCMS) to scale up the work on online hate and extend to other sources of harm online, particularly harassment, extremism and misinformation.
Within the health sector there is the potential for further application continuing the work on Clinical Trials and looking at the application of SPARRA tool to other public health agencies.
The training developed under multiple projects including the Turing-Crick biomedical awards, The Turing Way and Turing Commons are likely to be used by many UK Centres for Doctoral Training (CDTs) who have a shared need for high-quality and accessible skills and training resources. In particular for Turing Commons many centres are happy to work with the team to co-design and develop materials. Furthermore local government organisations (e.g. councils), have expressed an interest in data ethics and public engagement, especially work that can improve digital literacy and trust in data-driven technologies. They are finding that working with members of the public to build trust is an important step in implementing data-driven technologies for public services. Subsequent collaboration with the Crick institute will be supported by the newly launched Practitioners Hub, led by The Turing Way team from April 2023 onwards. The workshop materials developed in this project will serve as a crucial skill-building resource for all of The Turing Way Practitioners Hub's UK partners, including the Crick Institute in its inaugural cohort.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Communities and Social Services/Policy,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport

URL https://www.turing.ac.uk/research/asg
 
Description The findings from the research funded through this grant are being deployed in central and local government, industry and science research institutes. The data from individual projects provides a full inventory. However, to summarise, examples of impact are included below to illustrate the range of audiences benefiting from ASG research. They indicate how the research output is shaping the practices of many stakeholders who benefit from understanding how AI can facilitate their work in very concrete ways. It also indicates the relationships that have been developed through public engagement. Government: The project Hate Speech: Measures and Counter-measures led to the creation of an Online Hate Research Hub, which collates and organises resources for research and policy-making on online harms. The project also led to the creation of an Online Harms Observatory, in collaboration with the Department for Digital, Culture, Media and Sport. The Observatory is a new platform that provides real-time insight into the scope, prevalence, and dynamics of harmful online content. It bridges the gap between methodological robustness and practical relevance by leveraging innovative research to help policymakers, regulators, security services, and other stakeholders better understand the landscape of online harms. It led to the creation of high-profile policy reports, a substantive Consultation Response to DCMS and Home Office's Online Harms White Paper as well as Evidence Submission to the House of Lords Select Committee on Democracy and Digital Technologies, and important policy papers conducted for Ofcom. In addition, the lead researcher of the project, Bertam Vidgen, acted as a specialist advisor to the Joint Committee on the Draft Online Safety Bill. The project Building an Ethical Framework in the Criminal Justice Context led to the first ever ethics guidance for the use of AI in the public sector (Leslie, 2019). The publication of the guidance has spurred the production of a series of workbooks to train civil servants in responsible innovation. The workbooks became part of the UK National AI Strategy and are now in their piloting stage. The project also led to the creation of a new national training platform, the Turing Commons, and an Office of Artificial Intelligence-funded collaboration with UK regulators that yielded a seminal policy report on Common Regulatory Capacity for AI (2022). The project's lead researcher, David Leslie, has been appointed Specialist Advisor to Council of Europe's Committee on AI, where he has led the writing of the zero draft of Human Rights, Democracy and the Rule of Law Impact Assessment for AI, which will accompany its forthcoming AI Convention. Explaining decisions made with AI has also led to significant change at the national and international level. In particular, this guidance has prompted the Department of Digital, Culture, Media and Sport's International Standards directorate to work with our team of researchers at the Turing to co-conceive the UK's AI Standards Hub. Now part of the UK's National AI Strategy, the concept has been brought to fruition at the Turing, in partnership with British Standards Institute and the National Physics Laboratory, and with support from the UK Government. The Department for Business, Energy and Industrial Strategy (BEIS) and the Department for International Trade (DIT) are both heavily interested in the Labour Flow Networks and the Technological Sophistication in Production Networks projects. BEIS worked along the Shocks and Resilience researchers to study the impact of policy interventions during the COVID-19 pandemic, answering questions such as how furlough schemes might affect labour mobility. DIT are interested in the models for their potential in helping the Department determine the impacts of future trade agreements. Following a highly successful final workshop at the Turing, where our team of researchers trained BEIS and DIT's quantitative teams on the labour flow network model, the Turing is in the process of releasing the code repository so that the two Departments can adopt this model institutionally. Researchers from the Urban Analytics theme have been particularly active in the cultivation of a relationship with Department for Transport. Working in collaboration with Turing Fellows in the Urban Observatories (UO - Newcastle, Birmingham, Manchester) we have developed the Digital Twin for Urban Transport (DT-UT) (with additional funding through the Economic data Innovation Fund). This work has also involved engagement with local authorities in all of the UO partner cities and others, and exposed widely across DfT and other government stakeholders in DCMS, ONS, Cabinet Office and the Connected Places Catapult. Following an internal appointment to the role of digital twins champion we anticipate a longer term partnership between Turing and DfT in DT development. Different researchers from the same theme have built a tool that can help reduce household energy use for heating and save carbon emissions by making detailed data on energy performance available to local government and housing providers. Using public data from the English Housing Survey, National Energy Efficiency Data-Framework and energy performance certificates, EnergyFlex generates 'synthetic data' for homes in a given area to create a representative housing population. Applying an energy model to this population allows the user to simulate energy usage and performance at the level of individual homes. Researchers created a mapping tool to aid housing association Places for People in decision-making related to energy performance of its houses in Leeds. Working in collaboration, researchers from the urban analytics and the environment theme have engaged with local councils in Cornwall, Bristol and Bradford on a dynamic modelling platform with a focus on the future impacts of climate change (urban heating) on population health. This project, Dynamic Microsimulations for Environments - Climate, Heat and Health (DyME-CHH) uses synthetic populations, provided by the Synthetic Population Catalyst project. The Colouring Cities Research Programme (CCRP) developed a platform for sharing data on buildings, which can help to inform decision-making in areas such as urban planning and climate change adaptation. CCRP creates online maps that collate and visualise data on individual buildings, combining automated approaches and crowdsourcing to capture and verify missing data. Researchers are collaborating with international academic partners, with the result being a worldwide network of Colouring Cities platforms that are tailored to suit local needs. Adaptations by partner cities and countries address locally relevant issues. Australia, for example, is incorporating new data categories for accessibility and safety. Scottish Patients at Risk of Readmission (SPARRA) project has now created version 4 of its model. This has been fully fitted and is currently going through required steps with the expectation of full deployment by Public Health Scotland within the coming 12 months. Full deployment will mean that the new model developed on this grant will result in a risk score being calculated for 80% of the whole Scottish population for delivery to GPs on a monthly basis. Science research institutes: Researchers working with the Centre for Environment, Fisheries & Aquaculture Science (Cefas) developed new software that automatically counts and classifies plankton at a rate of 50 organisms per second with over 90% accuracy - beyond what is humanly possible. It is now in use aboard the scientific research vessel Cefas Endeavour, helping scientists to study marine life in the North Sea. The classification helps scientists gain a clearer picture of ocean health, due to the key roles that plankton play in ocean food webs, and carbon and nutrient cycles. The open-source software can be adapted to classify images of other marine objects and species, helping to transform the way scientists study the oceans. Scivision, a computer vision tool, is being applied by researchers across different disciplines and institutes including Rosalind Franklin Institute, John Innes Centre, Rothamsted, British Antartic Survey, CEFAS) The team working with JIC and Rothamsted in particular have developed several tools for plant phenotyping that are being now integrated into Scivision's toolkit for scientific data analysis, and are being adapted to new use cases. The AI tool, IceNet, that enables scientists to more accurately forecast Arctic sea ice conditions months into the future will integrate with a digital twin (virtual model) of the Royal Research Ship (RRS) Sir David Attenborough, operated by BAS, to help plot fuel-efficient routes through sea ice, thus supporting BAS's goal of becoming net zero by 2040. Collaborating with the World Wide Fund for Nature (WWF) and local governments, BAS is now exploring how IceNet can be applied in conservation planning and early warning systems for dangerous sea ice conditions across the Canadian Arctic. IceNet is a case study for the use of AI tools in environmental science in the Natural Environment Research Council's first digital strategy, published in 2022. Industry: A partnership with Lloyds Register has been formed to accelerate the digitalisation of the maritime industry. Work undertaken by researchers in the complex sustems engineering team verify and ensure the rigor of this work has resulted in the LR award of Digital Twin Approved Certification' to Furuno for HermAce VDR. The certification awarded by LR is the first independent verification of digital twin technology in the maritime industry. Researchers under the Ecosystem of Digital Twins cross-theme project worked with Zero Carbon Farms to develop a bespoke digital twin for their hydroponic farm as part of the Growing Underground project. The farm occupies over 1,000m2 of tunnels below the streets of Clapham and supplies 100-150 tonnes per year of microsalads to shops and restaurants. The digital twin uses a physics-based model to forecast conditions in the tunnels based on outdoor conditions but is, crucially, calibrated with real data from sensors around the farm. It provides a platform for data analysis to help improve the farm's output enabling farm managers to assess and optimise conditions on the farm, track and predict how different conditions affect yield volumes . Other The Turing Way is an openly available guidebook on how to carry out reproducible, ethical, collaborative data science. It is a community-led resource authored by hundreds of global contributors and updated in response to changes in the rapidly shifting data science landscape. It is emerging as a world-leading community of practice for responsible research. It provides guidance for government best practice processes in software engineering, as recommended by the Goldacre review on health data for research and analysis. In addition it is directly informing NASA's Transform to Open Science (TOPS) initiative and is considered invaluable in the development of NASA's OpenCore curriculum targeting NASA scientists. Through provision of advice and resources, The Turing Way supported the Health Foundation, which uses data analytics to tackle real-world health and social care issues, to achieve key milestones in progress towards its goal of open analytics. It has also provided inspiration for other knowledge sharing projects including an Office for National Statistics guide for government analysts who use coding in their work and FAIR Cookbook, a hands-on guide to reproducible research in the life sciences. The Data Safe Haven, Turing's trustworthy research environment (TRE), funded initially by ASG, became the first non-commercial, open source Azure-based TRE platform. The project team is now seeking additional stakeholder engagement and to influence the development of the national policy conversation around TREs via DARE UK. The Data Safe Haven and related projects at the Turing are referenced in DARE UK's Phase 1 report and recommendations. The pre-print on Data Safe Haven design choices has explicitly influenced the development of commercial cloud-based solutions produced by Amazon and Microsoft. The team is also convening research software engineers across the country to drive community collaboration and consensus on the direction of the field within the DARE UK programme, using additional DARE UK funding to drive its next programme of work.
First Year Of Impact 2019
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Communities and Social Services/Policy,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport
Impact Types Societal,Economic,Policy & public services

 
Title Recalibrating classifiers for interpretable abusive content detection 
Description Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election.
A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/4075460
 
Title Recalibrating classifiers for interpretable abusive content detection 
Description Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election.
A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/4075461