Infrastructure and Services - Useable Data
Lead Research Organisation:
Health Data Research UK
Department Name: UNLISTED
Abstract
The UK is home to a wide variety of different types of data, often with similar data held in different forms which makes bringing them together difficult. This programme will develop tools and approaches to bring that data together in standardised ways so that it is easier to use in health-related research, allowing researchers to work at larger scale and breadth to unlock new opportunities to improve health and wellbeing in the UK, for example through improving how new drugs and other approaches are tested through clinical trials.
Technical Summary
This work is funded by the UKRI Medical Research Council, UKRI Engineering and Physical Sciences Research Council, UKRI Economic and Social Research Council, Department of Health and Social Care, National Institute for Health Research (England), Chief Scientist Office (Scottish Government), Health and Care Research Wales, Public Health Agency HSC (Northern Ireland), British Heart Foundation and Cancer Research UK
The Useable Data programme will develop the tools and data engineering capability that enable seamless access to FAIR data, including Phenomics and Prognostic Atlas capabilities (dataset search, classification, and efficient metadata browsing tools described via open dataset catalogues, common data models and data dictionaries), and transforming data, and providing resources for clinical trials. It will focus on alignment of approaches to data and metadata (including phenotypes) with the aim of developing and driving adoption of consistent standards and formats for data and metadata. The work follows a principle of ‘minimal restriction’ rather than mandating a particular standard or platform; increasing interoperability of studies and datasets; and transparency of standards or formats used. It will achieve this through building reuseable, open and extensible software infrastructure through workstreams in ‘Data Standards', ‘Phenomics and Prognostic Atlas’ and ‘Transforming Data for Trials’ which will support research across the data-to-analysis pipeline. With the overarching aim of increased interoperability, this programme will allow research to take place at a wider scale, across a greater breadth of datasets and modalities.
The Useable Data programme will develop the tools and data engineering capability that enable seamless access to FAIR data, including Phenomics and Prognostic Atlas capabilities (dataset search, classification, and efficient metadata browsing tools described via open dataset catalogues, common data models and data dictionaries), and transforming data, and providing resources for clinical trials. It will focus on alignment of approaches to data and metadata (including phenotypes) with the aim of developing and driving adoption of consistent standards and formats for data and metadata. The work follows a principle of ‘minimal restriction’ rather than mandating a particular standard or platform; increasing interoperability of studies and datasets; and transparency of standards or formats used. It will achieve this through building reuseable, open and extensible software infrastructure through workstreams in ‘Data Standards', ‘Phenomics and Prognostic Atlas’ and ‘Transforming Data for Trials’ which will support research across the data-to-analysis pipeline. With the overarching aim of increased interoperability, this programme will allow research to take place at a wider scale, across a greater breadth of datasets and modalities.
Organisations
Publications
Abbasizanjani H
(2023)
Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration.
in BMC medical informatics and decision making
Au Yeung J
(2023)
AI chatbots not yet ready for clinical use.
in Frontiers in digital health
Au Yeung J
(2023)
Artificial intelligence (AI) for neurologists: do digital neurones dream of electric sheep?
in Practical neurology
Banerjee A
(2023)
Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study.
in The Lancet. Digital health
Barclay M
(2023)
Phenotypes and rates of cancer-relevant symptoms and tests in the year before cancer diagnosis in UK Biobank and CPRD Gold.
in PLOS digital health
Bean DM
(2023)
Hospital-wide natural language processing summarising the health data of 1 million patients.
in PLOS digital health
Carrasco-Zanini J
(2023)
Proteomic prediction of common and rare diseases
Domínguez J
(2023)
ROAD2H: Development and evaluation of an open-source explainable artificial intelligence approach for managing co-morbidity and clinical guidelines
in Learning Health Systems
Elkheder M
(2023)
Translating and evaluating historic phenotyping algorithms using SNOMED CT.
in Journal of the American Medical Informatics Association : JAMIA
Goonasekera M
(2024)
Accuracy of heart failure ascertainment using routinely collected healthcare data: a systematic review and meta-analysis
in Systematic Reviews
Jordan KP
(2023)
Determining cardiovascular risk in patients with unattributed chest pain in UK primary care: an electronic health record study.
in European journal of preventive cardiology
Kuan V
(2023)
Identifying and visualising multimorbidity and comorbidity patterns in patients in the English National Health Service: a population-based study.
in The Lancet. Digital health
LENS Collaborative Group
(2024)
Design, recruitment and baseline characteristics of the LENS trial.
in Diabetic medicine : a journal of the British Diabetic Association
MacRae C
(2023)
Age, sex, and socioeconomic differences in multimorbidity measured in four ways: UK primary care cross-sectional analysis.
in The British journal of general practice : the journal of the Royal College of General Practitioners
Mansouri-Benssassi E
(2023)
Disclosure control of machine learning models from trusted research environments (TRE): New challenges and opportunities.
in Heliyon
Mintz H
(2023)
Making administrative healthcare systems clinical data the future of clinical trials: lessons from BladderPath
in BMJ Oncology
Mizani MA
(2023)
Using national electronic health records for pandemic preparedness: validation of a parsimonious model for predicting excess deaths among those with COVID-19-a data-driven retrospective cohort study.
in Journal of the Royal Society of Medicine
Pineda-Moncusí M
(2024)
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity.
in Scientific data
Prugger C
(2023)
Incidence of 12 common cardiovascular diseases and subsequent mortality risk in the general population.
in European journal of preventive cardiology
Searle T
(2023)
Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models.
in Journal of biomedical informatics
Toader AM
(2024)
Using healthcare systems data for outcomes in clinical trials: issues to consider at the design stage.
in Trials
Wang W
(2023)
Machine Learning for Brain Disorders
Wang Y
(2023)
Public Opinions About Palliative and End-of-Life Care During the COVID-19 Pandemic: Twitter-Based Content Analysis
in JMIR Formative Research
Whitfield E
(2023)
A taxonomy of early diagnosis research to guide study design and funding prioritisation.
in British journal of cancer
Williams ADN
(2023)
A DELPHI study priority setting the remaining challenges for the use of routinely collected data in trials: COMORANT-UK.
in Trials
Wittner R
(2024)
Toward a common standard for data and specimen provenance in life sciences.
in Learning health systems