A roadmap and rich metadata catalogue for the analysis of federated sensitive data

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Parallel Computing Centre

Abstract

Much data kept by public organisations such as Government departments contains sensitive data about UK citizens and businesses. Since the Digital Economy Act 2017, progress has been made to enable accredited researchers to access these data to perform studies that are in the public benefit. Access is provided through accredited organisations called Digital Economy Act Accredited Processing Environments; twelve such organisations currently exist in the UK. Accredited researchers can submit proposals to an Accredited Processing Environment to gain access to data held by that organisation. Four of these organisations provide Trusted Research Environments in which the researcher can perform their study supported by analytical tools provided in that environment without the data leaving the environment. This approach is the most secure service to handle sensitive data.

The challenge for researchers is to answer research questions where datasets must be combined before analysis and where two or more datasets are owned by more than one accredited processing environment. We identify two barriers. First, a researcher cannot see these datasets and based on free text metadata descriptions of these datasets alone cannot assess whether a combination is feasible or leads to a sensible analysis. Second, the policies that govern access are specific between the data owners and the Trusted Research Environments. To overcome these barriers we propose the following.

We will agree and deliver a roadmap to allow researchers to discover, apply for and analyse data held at one of the UK's four national Trusted Research Environments (TREs) through a single front door, i.e., the Office for National Statistics (ONS) Integrated Data Service (IDS), the Scottish National Safe Haven, the SAIL Databank, or the Northern Ireland Statistics and Research Agency (NISRA). This work will agree standards, policies and procedures to enable researchers to analyse data combined from more than one government organisation. We will publish templates of all agreements, which can be adopted by other data investments in taking forward federation.

We will develop software that automatically creates a rich metadata catalogue for specific datasets in Trusted Research Environments and agree enhanced metadata standards in ways that researchers can understand if they could perform an analysis if they had access to the data. This enables researchers to decide if the investment to combine datasets is worthwhile because they can determine beforehand if the necessary data is present.

We will demonstrate the use of rich metadata through a specific use case of linking Scottish Government, HMRC and Office for National Statistics business data. We will develop a secure query link between the Scottish Safe Haven and ONS to enable analysis of combined datasets.?Our intention is to enable new policy relevant insights whilst laying the path for an ongoing federation across UK nation Trusted Research Environments.

Publications

10 25 50