Ensuring test evaluation research is applicable in practice: investigating the effects of routine data on the validity of test accuracy meta-analyses

Lead Research Organisation: University of Birmingham
Department Name: Health and Population Sciences


Diagnosis is a difficult process. A patient who presents to their doctor ill will often undergo a process which involves being asked questions, observed, examined and perhaps even having blood or imaging 'tests'. Each question asked or observation made is either a diagnostic test in its own right or part of one and is a necessary part of arriving at a diagnosis.

But some tests are better than others and importantly probably no test is 100% accurate. Sometimes a test result may suggest a patient has normal health when they actually have disease or have disease when they have normal health. This happens to all tests and diagnostic test accuracy research is aimed at evaluating how often this happens, in other words, determining how accurate tests are.

Essentially when a clinician decides upon a diagnosis they are consciously or otherwise invoking a probabilistic process where multiple tests are combined and the patient's diagnosis should be the one most probable given the combination of all the test results. However, for this process to be truly beneficial to the patient the clinician needs to know the accuracy of each of these tests and how likely the patient has disease before the diagnostic process has even started.

This is where the difficulty lies for those who practise evidence-based medicine. Although the accuracy of many tests has been estimated by research studies, for individual tests the accuracy may vary significantly between studies. This variation may depend on who is applying the test, how it is being applied, which patient it is being applied to and most significantly of all, how the accuracy was measured in the study. When there are several studies there are methods which allow us to combine their results. These methods may also help determine the real reasons why the test's accuracy varies. However, in general, the studies report insufficient data of sufficient quality to enable such analyses to be either possible or comprehensive.

Furthermore, from previous work, we have been able to demonstrate that in some cases the test accuracy reported by a study may be virtually impossible in some patient settings. This creates a problem for the doctor. How do they know which estimate of a test's accuracy to use if it varies greatly between studies and risks being nearly impossible for their own practice?

We have already begun to develop methods which make it possible to determine whether results from a test study are likely to accurately represent a doctor's practice in general. This would mean that a doctor could confidently apply the research to their own practice without reservation. However, sometimes the research is not reflective of the different clinical settings seen in practice and a more specific solution is required. This may be done by collecting routine data from the doctor's own setting and using it to determine a feasible range of values for the test's accuracy. This method, in its current form, is used to exclude the studies 'least likely' to derive a plausible estimate of a test's accuracy for the doctor in their own practice.

At the moment both methods are in development but potentially could be implemented into the real-world and used to improve diagnosis. There are clear patient benefits to improving diagnostic performance including reducing the number of patients treated unnecessarily and increasing the number treated appropriately. One of the aims of this research is to pilot integrating this method into General Practice to help diagnose infection. This could also help reduce the potential for antibiotic resistance by reducing the number of antibiotics prescribed inappropriately.

However, before this is done the methods need to be fully investigated to determine their utility and limitations. It may be that other approaches afford greater patient benefit, and an evaluation of these with the methods already described, will be the focus of the proposed research.

Technical Summary

Diagnostic test accuracy (DTA) research is not always informative in practice. Primary DTA studies and meta-analyses may produce results that do not transfer to practice. Recently we developed methods that evaluate the validity of DTA meta-analyses and incorporate routine data to generate a tailored meta-analysis (TMA) estimate that is specific to the practice. Both of these need further development and evaluation.

1. To synthesise a database for use in other work streams and review constrained models
2. To develop models which incorporate constrained data
3. To investigate methods which determine the validity of DTA review's results
4. To explore integrating the TMA model into UK General Practice

There will be 4 overlapping work streams (WS)

WS 1
We will construct a database of different cases of DTA meta-analyses with associated routine practice data collected on the test to provide input data to WS 2-4. We will also review methods that explore modelling constrained data

WS 2
This will develop other models that include all studies but weight the less probable studies accordingly. We will investigate both a constrained maximum likelihood and a Bayesian approach. In addition, we will use covariate modelling in meta-regression analyses to explore causes of heterogeneity

The validity of meta-analysis results may be determined by combining a cross validation procedure with an appropriate method for comparing primary and secondary study estimates. We have developed one method which predicts where a new study is likely to lie, but other approaches are possible and these will be investigated

WS 4
The TMA model for test evaluations will be integrated into general practice so GPs may apply it to their patients. Templates will be designed within the electronic records system (ERS) to collect routine data. The TMA model will be developed as a web-application to access from the ERS and will be tested in 6 practices

Planned Impact

This proposed research will produce a range of outputs that will be of interest to several different parties. In the early stages of the project, part of the communication plan will be raising the awareness of the difficulties of implementing diagnostic test accuracy (DTA) research. This will extend on our previous research into methods aimed at determining the validity and applicability of DTA studies in practice. It will also involve challenging the current orthodoxy of evidence synthesis methods in diagnostic research. As a result the outputs will inform both methodologists and clinicians.

Towards the end of the project new methods will have been developed that enable reviewers involved in the process of constructing a DTA meta-analysis to evaluate the statistical validity of the estimates produced. Moreover, the research will produce models that may synthesise estimates when the accuracy is known to be constrained in the values it may take. In this instance, the combining of evidence from the research literature with data from the practice of interest has implications for the type of statistical modelling that may be used. Consequently, the models per se will be of interest to the statisticians and methodologists in the diagnostic research community. Furthermore, constrained models are not widely known to medical statisticians so it is anticipated that the dissemination of this work is likely to yield further applications outside of diagnostic research in the future.

Whilst a large element of the research is focussed on methodological and model development the overarching theme is to enhance the decision-making on the applicability of test accuracy research in practice. Accordingly the methods will be applied to a number of diagnostic and screening tests used in clinical practice. In particular, tests used in the NHS national screening programmes such as the Nucleic acid amplification tests used to screen for Chlamydia will be evaluated and the results fed back to the respective screening committees. The outputs will be pertinent to the NHS and could potentially play a role in the future organisation of screening services.

Furthermore, one of the work streams will integrate a working model into the electronic records systems of 6 general practice surgeries so it may be used as a diagnostic decision support tool that is tailored to the patients in each practice. Overall the research will be relevant to both clinicians working at the sharp end of health care delivery and policy decision-makers who plan and implement service provision.

Patients will be the ultimate beneficiary of this research. In achieving the aim of improving decisions on when to apply test evaluation research in practice there is the potential to improve decision making on the treatment and management of patients on both the small and large scale.

The early outputs are expected to emerge in the first two years of work. Many of the outputs are likely to have an impact within 5 years of the project commencing. Thus the main model and methodological development in test accuracy research is expected to be completed within the term of the project (4 years) and disseminated within five. However, it is expected that the models will find application in other fields and although this will widen the impact it is also likely to take much longer.


10 25 50

Description Involved in a guideline production for the meta-analyses of test accuracy studies
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL https://jamanetwork.com/journals/jama/fullarticle/2670259
Description MRC Clinician Scientist fellowship
Amount £864,337 (GBP)
Funding ID MR/N007999/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 09/2016 
End 08/2020
Description NIHR Evaluation, Trials and Studies Coordinating Centre (NETSCC)
Amount £405,646 (GBP)
Funding ID 16/150/01 
Organisation NIHR Evaluation, Trials and Studies Coordinating Centre (NETSCC) 
Sector Public
Country United Kingdom
Start 02/2018 
End 01/2020
Title NAAT 2012 reported_Data analysis technique 
Description The research material consists of data from primary studies used in meta-analysis. Occasionally patient audit data from my own practice is used 
Type Of Material Model of mechanisms or symptoms - human 
Provided To Others? No  
Impact A paper has been accepted for publication and others should follow The work is on-going 
Title Optimisation algorithm for meta-analysis model 
Description Development of a new optimisation algorithm for the conducted of meta-analyses of test accuracy studies 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Too early to say 
URL https://journals.sagepub.com/doi/10.1177/0962280219853602
Title The effects of pre-test probability on the performance of clinical tests 
Description From statistical modelling and using data collected from practice, I have been able to demonstrate the effects of knowing when patients have a high probability of disease on the performance of clinical tests applied by doctors. The example used was for x-rays but it is likely that this extends to other clinical tests and has implications about the transferability of study results into practice. The research has been submitted for publication 
Type Of Material Model of mechanisms or symptoms - human 
Provided To Others? No  
Impact The research is yet to be published but it is likely that future evaluations of clinical tests and their implementation in practice will need to take into account the results of this research The work is on-going 
Title Validation statistic 
Description This is statistic which ascertains whether the results of meta-analyses are likely to be valid in a new setting 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact This has just been presented at an international conference and the work is currently under peer review The work is on-going 
URL http://2015.colloquium.cochrane.org/abstracts/are-predictions-test-accuracy-meta-analyses-valid-prac...
Title Optimisation algorithm for bivariate random effects model 
Description This a novel algorithm developed for the handling of model commonly used in meta-analysos 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact None yet 
URL https://journals.sagepub.com/doi/10.1177/0962280219853602
Title Tailored meta-analysis model 
Description The results of diagnostic studies may be wholly unrepresentative of particular practice settings. This is a method which allows us to decide which studies are representative 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact None yet 
URL http://www.ncbi.nlm.nih.gov/pubmed/24447592
Title Validation statistic for meta-analyses 
Description This is novel statistic designed to test whether meta-analysis estimates are likely to be valid in a new setting 
Type Of Material Data analysis technique 
Year Produced 2017 
Provided To Others? Yes  
Impact Appeared on wikipedia, cited 19 times 
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5575530/pdf/SIM-36-3283.pdf
Description Collaboration on use of THIN data base to produce pharmaco-epidemiology papers 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution Provided statistical support for the research
Collaborator Contribution Have extracted the data for analysis
Impact 3 papers
Start Year 2016
Description Collaboration with associate professor in dentistry 
Organisation Charité - University of Medicine Berlin
Country Germany 
Sector Academic/University 
PI Contribution Based on previous methodology, I was approached by colleague in Germany to apply the methods to diagnosis in dentistry. I provided the methods and data analysis
Collaborator Contribution The partner provided the data
Impact There is a potential paper but it is still under review.
Start Year 2016
Description Long term impact of pre-incision antibiotics on babies born by caesarean section 
Organisation University of Birmingham
Department Institute of Applied Health Research
Country United Kingdom 
Sector Academic/University 
PI Contribution Offered statistical and GP expertise to the application for a grant
Collaborator Contribution The research was led by a colleague
Impact Funding has been achieved with the NIHR
Start Year 2017
Description Modelling survival data in large primary care data bases 
Organisation Brown University
Country United States 
Sector Academic/University 
PI Contribution This was work on the analysis of routine data
Collaborator Contribution They provided the computer clusters and expertise regarding analysis
Impact A paper is expected
Start Year 2017
Description Qrisk2 scores and prescription of statins 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution An investigation into the prescribing behaviour of GPs when prescribing statins SF has gathered data, TM has supervised and I have analysed the data.
Collaborator Contribution SF has gathered data, TM has supervised and I have analysed the data.
Impact None so far
Start Year 2019
Title Optimisation algorithm for the bivariate random effects model used in meta-analysis of test accuracy studies 
Description The optimisation algorithm combines the Newton-Raphson method with the profile likelihood and Observed Fisher Information to fit the bivariate random effects model used in the meta-analysis of test accruacy studies 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact None yet 
URL https://journals.sagepub.com/doi/10.1177/0962280219853602
Description A lecture series of 4 lectures given at Brown University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact This was four lectures given on translation of test research evidence into practice - it was based on the work that I have carried out over the last 8 years
Year(s) Of Engagement Activity 2017
URL https://www.brown.edu/academics/public-health/research/evidence-synthesis-in-health/news/2017-03/vis...
Description A talk on a paper I wrote on 'philosophy of science and the diagnostic process' give to the Test Research Group at Exeter University in June 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A talk on a previously published paper on the 'philosophy of science and the diagnostic process' given to the Test Research Group at Exeter University in June 2018
Year(s) Of Engagement Activity 2018
Description Diagnostic decision making 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact This was talk to the patient participation group for general practice surgery to give an indication of my research and how doctors make diagnostic decisions in general
Year(s) Of Engagement Activity 2017