Strategic Priorities Fund - AI for Science, Engineering, Health and Government

Lead Research Organisation: The Alan Turing Institute
Department Name: Research

Abstract

please see attached business case.

Planned Impact

Please see attached business case

Publications

10 25 50
 
Title Reproducible secure research environments: Talk from Safe Data Access Professionals Quarterly Meeting on 08 June 2021 
Description Overview of the challenges of supporting reproducible research on sensitive data and how the Turing addresses these in its Safe Haven secure research environment. 
Type Of Art  
Year Produced 2021 
URL https://figshare.com/articles/presentation/Reproducible_secure_research_environments_Talk_from_Safe_...
 
Description There are four principal research domains for the AI for Science and Government (ASG) programme - digital twins, health, the criminal justice system, AI for Science and Tools, Practices and Systems. Its main achievements are (1) the development of ML and other data analytic methods for greater efficiency across government, scientific practice and industry; (2) creating a diverse set of research collaborations including non-academic - examples of which are provided below. ASG's third (3) overarching achievement is that its research activities has catalysed over £20m in additional funding.
Digital Twins: Urban Analytics: It has been identified that the Rapid Assistance in Modelling the Pandemic (RAMP) model that was created to support the pandemic modelling community working on Coronavirus can be disaggregated into modules that offer a solid basis for other projects ranging from the study of spatial inequality and energy distributions to infrastructure investment and sustainable cities. It is currently being configured to examine environmental change through the Dynamic Model for Environment (DyME) Climate, Heating and Health (CHH) project.
Digital Twins: Complex Systems Engineering: The equadratures community continues to grow rapidly, with increased adoption and engagement. The team gave their first one day to Safran, an international high-technology group, operating in the aviation, to demonstrate the rapid prototyping, optimisation and design methods within the open-source package. The equadratures open-source code has attracted over 42,000 downloads as of January 2022.
Also within this theme the research team focused on infrastructure and construction completed the implementation of the statistical finite element method (statFEM) for stationary problems and completed the first release of their statFEM library. They are also continuing to collaborate with the MX3D bridge team to begin the statFEM analysis.
Health - Through a collaboration with Francis Crick Institute the team have introduced Biomedical Science Data Awards aimed at early career researchers. Material for two masterclasses are under development (1) introduction to data science and AI for the biomedical researchers, and (2) managing open and reproducible computational projects in biosciences. The training resources will be developed openly under The Carpentries incubator programme and shared via the Turing online training platform.
Under the Turing-MRC collaboration on statistical machine learning for randomised clinical trials, pseudonymisation of clinical datasets and transfer from UCL Clinical Trials Unit secure environment to the Turing Safe Haven has taken place They have also obtained model output datasets describing performance of each model, transferred from the Turing Safe Haven back to UCL Clinical Trials Unit secure environment for future use in the project.
Criminal Justice System - The Turing Commons is an online platform to promote and develop data and digital skills and literacy among various groups (e.g., researchers, developers, policymakers, and members of the public). It comprises a set of interactive, educational resources that introduce and explain key topics of data science and AI and responsible research and innovation (RRI). During the first course the project lead used data from Public Health England (PHE) to design an educational activity, a model to generate synthetic data about fictional COVID-19 patients which helped explain how bias can be assessed during exploratory data analysis. This activity received very positive feedback and the team hope to expand this work further.
AI for Science - the AI benchmarking projects have produced important methods for the rapid analysis of, for example, 3-D images from cryo-electron microscopes. The ability to process data at speed and scale is opening new research questions and the possibilities of new science.
Tools, Practices and System - The Turing Way is an open source and community-led project that aims to make data science comprehensible and useful for everyone. This project brings together its diverse community of researchers, educators, learners, administrators and stakeholders from within The Alan Turing Institute and beyond. The Turing Way provides a web-based book with five guides on reproducible research, project design, communication, collaboration and ethical research, as well as a community handbook. The project is available on GitHub and online https://the-turing-way.netlify.app
Exploitation Route Through government
The uses of the urban models that are part of digital twins are being discussed with the Ministry of Housing, Communities and Local Government with a view to being made available nationally. In addition A novel web tool has been developed to assist local authorities in the UK with designing low-traffic neighbourhoods and predicting traffic impacts.
Online Harms Observatory project is working in partnership with the Department for Digital, Culture, Media & Sport (DCMS) to scale up the work on online hate and extend to other sources of harm online, particularly harassment, extremism and misinformation.
A project under the health theme is engaging with Public Health Scotland to help doctors to better identify patients at risk of health breakdown and allows them to intervene early by increasing appointments, adjusting medication or making targeted referrals.
A project on the deployment of police resources is being carried out with the Durham Constabulary

Through research institutes
AI for Science' projects are all collaborative with other research councils and their institutes The projects are all designed by discipline-specific scientists to develop tools and advanced methods that will enable their scientific field to achieve greater outputs; and each project has a challenge through which these AI techniques are developed. Institutes we are partnering with include, British Antarctic Survey, Centre of Ecology and Hydrology, Rosalind Franklin Institute, John Innes Centre, Earlham Institute, Rothamstead Research, MRC Laboratory of Molecular Biology, Met Office, National Oceanography Centre, Science and Technology Facilities Council, Rutherford Appleton Laboratory
The team have developed a formal collaboration with the Rosalind Franklin Institute (RFI) to start a pilot project in the Cryo-ET domain. This will involve the continued development of macromolecular particle detectors for complex heterogeneous data and develop datasets and tools to enable development of detection models
Under Health the team are working jointly with the Francis Crick Institute to deliver biomedical data science training to early career researcher. There are also ongoing discussions with the Sanger Institute.
Across the programme we are aiming to bring together the work with RFI, Sanger Institute, Francis Crick Institute and the MRC Laboratory on Molecular Biology.

Through industry
The projects under 'Digital Twins: Complex Systems Engineering' are being carried out jointly with such industrial partners as Rolls Royce, Scania, Safran etc.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Communities and Social Services/Policy,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport

URL https://www.turing.ac.uk/research/asg
 
Description The findings are being deployed in central and local government, industry and science research institutes. The data from individual projects provides a full inventory. However, to summarise, examples of impact are included below to illustrate the range of audiences benefiting from ASG research. They indicate how the research output is shaping the practices of many stakeholders who benefit from understanding how AI can facilitate their work in very concrete ways. It also indicates the ongoing relationships that are being developed through this public engagement. Government: The launch of the Online Harms Observatory project, in partnership with the Department for Digital, Culture, Media & Sport (DCMS) has taken place. The observatory scales up the work on online hate and extends to other sources of harm online, particularly harassment, extremism and misinformation. The projects research fellow has been appointed as Special Advisor to Draft Online Safety Bill Joint Committee, they released the draft bill at the end of 2021. In addition to this the team were commissioned by Ofcom to help better understand online hate and video sharing platforms. The team delivered a report which has directly informed Ofcom's new regulatory guidance for the EU-wide updated Audiovisual Media Services Directive. During this period, the Health theme continues to make scientific advancements. This is seen particularly through the Scottish Patients at Risk of Readmission 'SPARRA' project. SPARRA helps doctors to better identify patients at risk of health breakdown and allows them to intervene early by increasing appointments, adjusting medication or making targeted referrals. This will ultimately improve patient care whilst also easing pressure on Scottish healthcare system. The fourth version of the model is in preparation for deployment within Public Health Scotland's (PHS) Azure environment. The Public Sector Guidance on Safe and Ethical AI project has continued to work with the Information Commissioners Office (ICO), widening the impact and uptake of the ICO/Turing explainability guidance through public engagement and community-led, co-designed content improvement. It has now become the most accessed and cited AI ethics guidance globally. The Ethics theme lead is a member of The Council of Europe's Ad Hoc Committee on Artificial Intelligence (CAHAI), which has drawn up a feasibility study to explore options for international legal response for AI based on the Councils standards. The Turing team produced a primer to accompany the study and assist the consultation. Researchers based at Leeds University have been working with Bradford local council on a plan for addressing health inequalities across a District following the pandemic. The plan ('From Inequalities to Opportunities') is led by the Department for Education who are funding the team to support other areas to adopt these methods. The project has used datasets to understand the factors impacting outcomes for children and young people, and ultimately deliver data science tools that can improve information sharing and service delivery for families. The project has been used as a catalyst to create an 'Integrated Data Analytics Unit' (IDAU) that is bringing academics together with analysts from various services to explore how we can use the power of data science to improve safeguarding, early detection of vulnerability etc. The QUANT project has been expanding to include road and bus networks in addition to rail and employment factors all while modifying high profile scenarios, CaMKOx, Heathrow, HS2 and East West Rail. The team constructed a new HS2 scenario using the Integrated Rail Plan (IRP) that the Government released in November. Beyond this, flooding impacts have been developed in spatial scenarios in conversation with the Environment Agency. As a follow up to the previous report further impact has been seen with the modelling police supply and demand project. Following conversations with the Home Office the team have been in direct consultation with the Metropolitan Police and Durham Constabulary to pilot the model, which has led to the creation of a dashboard that enables the forces criminal investigation department to visualise crime demand and inform its resourcing decisions. The team envisages this model being used as a planning tool, with the police able to test different resourcing strategies by tweaking the model parameters. For instance, police forces could look at the impact of different shift patterns on their resources, or the impact of changing the amount or type of officers sent to a particular crime type. Science research institutes: A new AI tool, IceNet enables scientists to more accurately forecast Arctic sea ice conditions months into the future addressing the challenge of producing accurate Arctic sea ice forecasts for the season ahead - something that has eluded scientists for decades. It is almost 95% accurate in predicting whether sea ice will be present two months ahead - better than the leading physics-based model. Sci-vision, a toolkit for scientific image analysis, is a generalisable Python framework for applying computer vision methods to a wide range of scientific imagery. Scientifically, it is already delivering on enabling new breakthroughs in different areas of research and demonstrating impact. In particular the Centre for Environment Fisheries and Aquaculture Science (CEFAS) are looking to apply this to improve plankton classification accuracy which in a pilot improved rates to over 90%. Equadratures has already seen impact within industry with Rolls Royce in particular as reported previously but it has also been used by the United States Geological Survey to provide sensitivity metrics for underwater seagrass to better understand the impact of coastal hydrodynamics on vegetation. With a very small simulation budget, equadratures was able to provide what was needed by the team. Industry: A new framework for using transfer learning to improve fleet-wide predictions has been developed and evaluated in a pilot, enabling remaining-useful life predictions of engines monitored by Scania. The project team is now in discussion with one of Europe's largest producers and retailers of electricity and heat who have provided windfarm monitoring datasets, giving the team the opportunity to develop a new "fleet"-wide monitoring model across spatially distributed wind-turbines.
First Year Of Impact 2020
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Communities and Social Services/Policy,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport
Impact Types Societal,Policy & public services

 
Title Recalibrating classifiers for interpretable abusive content detection 
Description Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election.
A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/4075460
 
Title Recalibrating classifiers for interpretable abusive content detection 
Description Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election.
A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/4075461