Automated Clinical Epidemiology Studies (ACES) platform for complex epidemiology study designs and diverse databases
Lead Research Organisation:
University of Birmingham
Department Name: Institute of Applied Health Research
Abstract
Routinely collected health care data are derived from electronic medical records, health insurance records and administration records in healthcare organisations. These databases are increasingly being used for research. They have been used to generate ideas about causes of illness, evaluation of health service policies, clinical audits and surveillance of diseases and looking for adverse effects of medications. Beyond these benefits routine databases are also useful to find out if the effects of drugs that are observed in Randomised Controlled Trials (RCT) are also observed in real world setting, especially in groups of people whose characteristics are different to those in the RCT studies.
Despite the benefits of routinely collected healthcare databases there are numerous challenges in utilising them for research. Some of the challenges are due to difficulty in extracting data in a way that allows complex study designs. Data extraction is expensive and tedious in terms of time, cost, effort and expertise. This is partly because the databases are huge in size, vary in structure and have wide range of data. Some of the difficulty in extraction is due to complexity of study designs needed to probe these databases, because the data was not collected for research purposes and therefore have numerous inherent biases. Furthermore any extraction needs clinical, epidemiological and technical expertise to interrogate these databases. These issues can lead to many human induced errors and can result in data that are not accurate and reproducible.
Working with computer scientists, clinicians and methodologists we have developed an Automated Clinical Epidemiology Studies (ACES) platform for extracting data that are accurate and reproducible for epidemiological studies in one database of medical records from general practices (The Health Improvement Network database). The platform enables to complete data extraction within minutes to hours which previously took weeks to months when done manually. The platform has already enabled numerous studies in the last 12 months.
Now that we have developed such a platform, in this research programme, we aim to extend this platform to; 1) complex epidemiological study designs and 2) databases that have different structure and coding systems.
For complex study designs we will develop and evaluate one platform for linked mothers and babies databases and another for studies of the effects of drugs (pharmaco-epidemiological studies). Pharmaco-epidemiological studies help with understanding the beneficial and harmful effects of medications. In the process of developing the automated platform for pharmaco-epidemiological studies we will also review and where necessary develop methodologies to estimate the effects of medications more accurately.
We have been in conversation with institutions in other countries to extend our ACES platform to their databases, which have different structure and coding systems, and evaluate if this works. If we achieve this then we could research multiple databases across different countries for one question simultaneously.
Finally we will also assess the risks of having such an automated data extraction system. For example, it is possible to conduct numerous studies within a day and only report ones that are showing positive results. We will identify such issues by discussing with relevant stakeholders and produce a set of recommendations on how best to avoid such situations.
Despite the benefits of routinely collected healthcare databases there are numerous challenges in utilising them for research. Some of the challenges are due to difficulty in extracting data in a way that allows complex study designs. Data extraction is expensive and tedious in terms of time, cost, effort and expertise. This is partly because the databases are huge in size, vary in structure and have wide range of data. Some of the difficulty in extraction is due to complexity of study designs needed to probe these databases, because the data was not collected for research purposes and therefore have numerous inherent biases. Furthermore any extraction needs clinical, epidemiological and technical expertise to interrogate these databases. These issues can lead to many human induced errors and can result in data that are not accurate and reproducible.
Working with computer scientists, clinicians and methodologists we have developed an Automated Clinical Epidemiology Studies (ACES) platform for extracting data that are accurate and reproducible for epidemiological studies in one database of medical records from general practices (The Health Improvement Network database). The platform enables to complete data extraction within minutes to hours which previously took weeks to months when done manually. The platform has already enabled numerous studies in the last 12 months.
Now that we have developed such a platform, in this research programme, we aim to extend this platform to; 1) complex epidemiological study designs and 2) databases that have different structure and coding systems.
For complex study designs we will develop and evaluate one platform for linked mothers and babies databases and another for studies of the effects of drugs (pharmaco-epidemiological studies). Pharmaco-epidemiological studies help with understanding the beneficial and harmful effects of medications. In the process of developing the automated platform for pharmaco-epidemiological studies we will also review and where necessary develop methodologies to estimate the effects of medications more accurately.
We have been in conversation with institutions in other countries to extend our ACES platform to their databases, which have different structure and coding systems, and evaluate if this works. If we achieve this then we could research multiple databases across different countries for one question simultaneously.
Finally we will also assess the risks of having such an automated data extraction system. For example, it is possible to conduct numerous studies within a day and only report ones that are showing positive results. We will identify such issues by discussing with relevant stakeholders and produce a set of recommendations on how best to avoid such situations.
Technical Summary
Work Package(WP)1: Development and validation of Automated Clinical Epidemiology Studies (ACES) software architecture for Automated Infant and Mothers Studies (AIMS) and Automated Pharmaco-Epidemiology Studies (APES)
We will develop algorithms to link mother and baby pairs utilising linked primary and secondary care data for AIMS and implement evidence based methodologies identified or developed in WP2 for APES. Functional and technical validation will be performed.
WP2: Precision methodologies for pharmaco-epidemiological studies
Three key biases in utilising routinely collected data (RCD) for pharmaco-epidemiological studies are: prescription by indication bias; immortality time bias; and not accounting for unobserved confounders. We will conduct systematic reviews to identify potential methodologies to reduce these biases and recommend where and in which circumstances these methodologies should be applied. Where there are gaps in evidence we will propose new methodologies to mitigate them.
WP3: Extend and evaluate our current ACES architecture to databases with differing nomenclature and structure
We aim to extend our ACES architecture to enable studies to be conducted in diverse countries that have databases with differing nomenclature and structures. We will do this by developing algorithms to normalise structure and then by applying Extract, Transform and Load (ETL) architecture to conduct studies seamlessly across multiple databases.
WP4: Identify and manage ethical issues of ACES
There are ethical challenges with ACES tools that need to be identified and systems put in place to mitigate them; e.g. multiple associations can be rapidly tested and a researcher could pursue only those with positive associations. We will identify such risks and potential solutions for them by literature review, semi-structured interviewing with data providers and through focus groups with researchers. We will then develop solutions with key stakeholders.
We will develop algorithms to link mother and baby pairs utilising linked primary and secondary care data for AIMS and implement evidence based methodologies identified or developed in WP2 for APES. Functional and technical validation will be performed.
WP2: Precision methodologies for pharmaco-epidemiological studies
Three key biases in utilising routinely collected data (RCD) for pharmaco-epidemiological studies are: prescription by indication bias; immortality time bias; and not accounting for unobserved confounders. We will conduct systematic reviews to identify potential methodologies to reduce these biases and recommend where and in which circumstances these methodologies should be applied. Where there are gaps in evidence we will propose new methodologies to mitigate them.
WP3: Extend and evaluate our current ACES architecture to databases with differing nomenclature and structure
We aim to extend our ACES architecture to enable studies to be conducted in diverse countries that have databases with differing nomenclature and structures. We will do this by developing algorithms to normalise structure and then by applying Extract, Transform and Load (ETL) architecture to conduct studies seamlessly across multiple databases.
WP4: Identify and manage ethical issues of ACES
There are ethical challenges with ACES tools that need to be identified and systems put in place to mitigate them; e.g. multiple associations can be rapidly tested and a researcher could pursue only those with positive associations. We will identify such risks and potential solutions for them by literature review, semi-structured interviewing with data providers and through focus groups with researchers. We will then develop solutions with key stakeholders.
Publications


Adderley NJ
(2022)
Development and external validation of prognostic models for COVID-19 to support risk stratification in secondary care.
in BMJ open

Adderley NJ
(2018)
Risk of stroke and transient ischaemic attack in patients with a diagnosis of resolved atrial fibrillation: retrospective cohort studies.
in BMJ (Clinical research ed.)

Braithwaite T
(2021)
Epidemiology of Scleritis in the United Kingdom From 1997 to 2018: Population-Based Analysis of 11 Million Patients and Association Between Scleritis and Infectious and Immune-Mediated Inflammatory Disease.
in Arthritis & rheumatology (Hoboken, N.J.)

Chandan JS
(2019)
Intimate partner violence and temporomandibular joint disorder.
in Journal of dentistry

Chandan JS
(2019)
The burden of mental ill health associated with childhood maltreatment in the UK, using The Health Improvement Network database: a population-based retrospective cohort study.
in The lancet. Psychiatry

Chandan JS
(2021)
Nonsteroidal Antiinflammatory Drugs and Susceptibility to COVID-19.
in Arthritis & rheumatology (Hoboken, N.J.)


Crowe FL
(2019)
Non-linear associations of 25-hydroxyvitamin D concentrations with risk of cardiovascular disease and all-cause mortality: Results from The Health Improvement Network (THIN) database.
in The Journal of steroid biochemistry and molecular biology

Fox J
(2021)
Rapid translation of clinical guidelines into executable knowledge: A case study of COVID-19 and online demonstration.
in Learning health systems
Description | Public Health England utilises DExtER |
Geographic Reach | National |
Policy Influence Type | Implementation circular/rapid advice/letter to e.g. Ministry of Health |
Description | INSIGHT Hub |
Amount | £3,400,000 (GBP) |
Organisation | Health Data Research UK |
Sector | Private |
Country | United Kingdom |
Start | 11/2019 |
End | 08/2022 |
Description | Improving testing for cardiometabolic diseases in women with previous gestational diabetes mellitus: an exemplar study on implementation and evaluation of a novel data driven randomised clinical trial platform in primary care |
Amount | £385,000 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 05/2022 |
End | 01/2025 |
Description | Marshalling health system experience of 'patients like me' to guide treatment decisions: a UK demonstrator of the informatics consult |
Amount | £199,000 (GBP) |
Organisation | Health Data Research UK |
Sector | Private |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2021 |
Description | Multimorbidity and Pregnancy: Determinants, Clusters, Consequences and Trajectories (MuM-PreDiCT) |
Amount | £2,948,688 (GBP) |
Funding ID | MR/W014432/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2021 |
End | 08/2024 |
Description | OPTIMising therapies, discovering therapeutic targets and AI assisted clinical management for patients Living with complex multimorbidity (OPTIMAL study) |
Amount | £2,450,000 (GBP) |
Funding ID | NIHR202632_O |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 07/2021 |
End | 08/2024 |
Description | Therapies for Long COVID in non-hospitalised individuals: From symptoms, patient-reported outcomes and immunology to targeted therapies (The TLC Study) |
Amount | £2,200,000 (GBP) |
Funding ID | MC_PC_20050 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2021 |
End | 02/2023 |
Title | DextER Updated version |
Description | DExtER is a software that was created before the fellowship. This forms the basis on which the whole new Automated Clinical Epidemiology Studies (ACES) platform will work. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | No |
Impact | Needless to say the tool enable data extraction according to study designs within a couple of hours compared to weeks and months previously |
URL | https://www.birmingham.ac.uk/research/activity/applied-health/research/health-informatics/Automated-... |
Description | ACES Global: Maastricht University |
Organisation | Maastricht University (UM) |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | We have established a collaboration with the Maastricht University primary care to implement and evaluate the tool. We successfully managed to do check the feasibility to implement the tool during our visit to Maastricht. We are now in the process of finalising the contract. Once the contract is done we will be able to jointly perform research |
Collaborator Contribution | They have made available their RNFM database for the evaluation. |
Impact | The key output has been the testing of the feasibility. |
Start Year | 2018 |
Description | ACES for CALIBER |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are implementing the DExtER software to the CALIBER platform. This will enable seamless data extraction and produce analysable datasets within hours. |
Collaborator Contribution | They have provided access to the CALIBER platform. |
Impact | We are currently working jointly towards a grant on Informatics Consult |
Start Year | 2019 |
Description | Cegedim |
Organisation | Cegedim |
Country | France |
Sector | Private |
PI Contribution | Cegedim is an industry that provides THIN data and also the provider of the VISION EHR. We have provided license to them to use DExtER, our innovative tool for automated clinical epidemiology studies. We have also provided expertise on processing the THIN database |
Collaborator Contribution | Cegedim provided the THIN data for COVID19 related research. |
Impact | 4 Publications on COVID19 research |
Start Year | 2020 |
Description | Helen Dolk,Prof of Epidemiology and Health Services Research |
Organisation | Ulster University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Provide a wider team to access and identify the gaps and provide clinical support |
Collaborator Contribution | contributing to WP4, on polypharmacy, to identify suitable European data sources, help build research protocols for the future, and contribute to the design and conduct of an exemplar study using the EUROmediCAT database. |
Impact | NA |
Start Year | 2020 |
Description | Ulster University |
Organisation | Ulster University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have analysed data to identify combination of medications that needs studied in multimorbid pregnancies. |
Collaborator Contribution | Key study designs for the euromedicat database |
Impact | We have added Ulster University as a collaborator |
Start Year | 2020 |
Description | University College London |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Project 1: LHS4NHS We have developed a prototype for a data driven learning health systems for hospital care in the National Health Service (LHS4NHS). The project expands on the existing success of the data extraction for epidemiological research (DExtER) tool developed by our team at the University of Birmingham. By adding on additional functionality such as automated study analytics and computable guideline we demonstrated the flow of data to knowledge and knowledge to practice in secondary care (Figure 1). Project 2: OPEHRRA We are currently in the process of developing OPEHRRA (OPen Electronic Health Record Research and Analytics platform) which is a novel platform designed for researchers and clinicians, by researchers and clinicians, to support them in managing their local, regional and national patient populations. The purpose of the platform will enable researchers and clinicians to engage all aspects of the 'open science' cycle including the open release of study design, data extraction and publication. They will have the opportunity to review the work of others in the network as well as request reviews for your own work. The ultimate goal is to make science transparent, reliable and accurate. OPEHRRA has several key elements to it including: 1. Open access protocol submission 2. Open protocol peer review comments and ethical approval 3. Open clinical code list generation and storage 4. Open data extraction 5. Open analytics 6. Open manuscript deposition and peer review |
Collaborator Contribution | They developed a framework for informatics consult |
Impact | The main output of LHS4NHS is a demonstrator of a data driven learning health system with inbuilt computable clinical guideline and an informatics consult tool for secondary care data. The example we demonstrated in the better care event was based on a case study for managing diabetes in hospitalised patients with COVID19. We are currently in the process of writing this up as a manuscript. The main output of OPEHRRA is an online demonstration of a portal for the research community to conduct 'open-science' using electronic health record data. |
Start Year | 2020 |