HCD: Synthesis of networks of evidence on test accuracy, with and without a 'gold standard'

Lead Research Organisation: University of Bristol
Department Name: Bristol Medical School

Abstract

A diagnostic test is any kind of medical test or assessment used to determine whether an individual does or does not have a disease or clinical condition. For most diseases there are multiple possible tests that could be used, each with different characteristics (e.g. accuracy, invasiveness to the patient, ease and speed of use, cost). Healthcare providers, laboratories and policy makers are faced with decisions about which test - or combination of tests - to use in practice for each disease.

Although there are many factors to consider in making these decisions, one key consideration is the accuracy of each test. Most diagnostic tests do not have perfect accuracy: there is almost always a chance of some false positive and/or false negative results. Clearly, other factors being equal, tests that make fewer such errors are preferred. Information on the accuracy of any given test is very often available from multiple studies, and this information is statistically combined. These 'pooled' estimates are used for decision making. For example, they are a key component of 'decision models', used by bodies such as the National Institute for Health and Care Excellence (NICE) in the UK to estimate and compare the effectiveness and cost-effectiveness of different testing strategies.

Methods for combining information from multiple studies on the accuracy of a single test are now well established. But these are inadequate for answering clinically important questions about how the accuracy of two or more tests compares and about the accuracy of tests used in combination. One of the difficulties is that different studies tend to report data of very different types: for example, Study 1 reports data on the accuracy of Tests A and B and also reports the overlap between test results on A and B; Study 2 reports data on the accuracy of A and B but doesn't report the amount of overlap; Study 3 reports on the accuracy of test A only; while Study 4 reports on tests B and C etc. A general modelling framework is needed that can analyse all such data, i.e. 'networks of evidence', together.

An additional problem is that standard methods are based on a key assumption that accuracy can be (and has been in all studies, e.g. 1-4 in the example above) estimated directly by comparing test results with results from a 'gold standard' test. This is a test that is assumed to be error-free, i.e. perfectly accurate, but not fit to be used routinely on all patients (for example, it may be highly invasive or very expensive). In practice, often either no such test for a given disease exists, or it has not been applied in all studies. As a result of this unrealistic assumption, many estimates of test accuracy - and subsequent estimates that are reliant on these, e.g. of effectiveness and cost-effectiveness - could be completely wrong. However, careful modelling of networks of evidence will offer a route to relaxing this assumption, through a type of more advanced statistical modelling called 'latent class models'. Through modelling of the overlap between results on multiple tests applied to the same individuals, latent class models are able to provide the required estimates of test accuracy without any direct classification of each individual as diseased/disease-free.

In this program of research we will develop a general statistical modelling framework to model networks of evidence on test accuracy, that will be applicable across wide ranging clinical areas. The approach will deliver more reliable estimates of the accuracy and comparative accuracy of tests or combinations of tests - ultimately leading to improved decisions about use of tests in practice. We will provide training and resources to support use of the methods developed.

Technical Summary

Overall objective: We will develop general methodology to synthesise networks of evidence on the sensitivity and specificity of tests for any given disease. The approach will be applicable across clinical areas and will simultaneously produce pooled estimates of singular, joint and comparative accuracy, allowing for imperfections in 'reference standard' tests where appropriate. The framework will accommodate data of varied types, e.g. studies reporting results on (i) Test A vs a Gold Standard (GS) test; (ii) A vs B vs GS; (iii) A vs B without a GS; (iv) A vs GS and B vs GS reported separately; (v) B vs C vs GS etc. We will demonstrate how the assumption that any one of the tests in the network is a GS can be relaxed.

Why are these estimates needed? Estimates of sensitivity and specificity, informed by systematic reviews and meta-analyses, are crucial for decision-making about which test and/or combination of tests to use for any given disease. For example, these are key parameters in decision models used by NICE to compare the effectiveness and cost-effectiveness of testing strategies.

Why are new methods needed? Standard methods (i) cannot produce estimates of the accuracy of tests used in combination, accounting for likely dependencies between them; (ii) produce unnecessarily imprecise estimates of comparative accuracy, (iii) are critically reliant on the assumption that a GS test exists and has been applied in all studies. Proposed advanced methods are not yet fit for general use: for example, involving estimation of infeasible numbers of parameters.

Methods: We will work within a Bayesian multi-parameter evidence synthesis framework, allowing data of varied types to be synthesised together through specification of the relationships between parameters. We will draw on - and extend - existing methods for meta-analysis of comparative test accuracy, and latent class models that have been proposed for estimation of accuracy in the absence of a GS.

Planned Impact

We will produce a general modelling framework that can be used by analysts worldwide to produce more reliable (unbiased and often likely more precise) estimates of singular, comparative and joint test accuracy.

Guideline developers: A major benefit will be to the National Institute for Health and Care Excellence (NICE) in the UK and to other guideline developers worldwide. Test accuracy parameters are key parameters in decision models used, for example, by the NICE Diagnostic Assessment Program and to inform NICE Guidelines. Models developed will be applicable across a wide range of clinical areas. Their use will facilitate the decision-making process about testing strategies, and ultimately lead to more robust decisions being made.

Laboratories: Laboratories, such as Public Health England laboratories, and healthcare providers are also directly faced with choices between multiple tests. Although they rarely perform systematic reviews of test accuracy themselves, they will benefit from improved evidence on comparative test accuracy produced by other teams using our methods.

Teams undertaking systematic reviews: Any team undertaking a systematic review of test accuracy (with or without an accompanying decision model) will benefit from the improved methods. This includes teams undertaking Cochrane reviews and reviews undertaken as part of the NIHR Health Technology Assessment (HTA) programme, and other reviews conducted by teams of academics. Manufacturers of tests (e.g. Roche) also sometimes undertake their own systematic reviews of test accuracy. In addition to improved statistical methodology for evidence synthesis (such that results from reviews will be more robust and more clinically relevant), a key objective of our proposal is to produce guidance for reviewers on (i) what types of evidence it is worthwhile 'finding', (ii) what types of data it is worthwhile extracting. This will make the systematic review process more efficient.

Healthcare providers and patients: Ultimately, the benefit will be to the NHS (and other healthcare providers), clinicians and patients. Improved evidence on testing strategies will lead to improvements in diagnostic pathways, including more cost-effective use of resources. Improved estimates of test accuracy directly lead to (through application of Bayes' rule) improved estimates of the more clinically relevant positive and negative predictive values, such that clinicians will be better informed when making their decisions about the next step for any given individual. The benefit to patients of improved diagnostic testing strategies is of course high, including potentially faster diagnosis and subsequent treatment, and a reduced chance of 'false alarms' and the anxiety that this inevitably induces.

Publications

10 25 50

 
Description The benefits, harms and costs of surveillance for hepatocellular carcinoma in people with cirrhosis: synthesis of observational and diagnostic test accuracy data and cost-utility analysis
Amount £337,568 (GBP)
Funding ID NIHR134670 
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 07/2022 
End 03/2024
 
Description 'Introduction to Diagnostic Test Accuracy reviews' training course for PenTAG, University of Exeter 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact ~15 academic researchers from PenTAG, the Technology Assessment Group at the University of Exeter, attended our online training course (over 2 half days), on how to complete systematic reviews of diagnostic test accuracy
Year(s) Of Engagement Activity 2022
 
Description Invited seminar at University of Edinburgh 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact I was invited to deliver a 1 hour seminar at the maths department at the University of Edinburgh. Title "Bayesian evidence synthesis models for prevalence estimation and diagnostic test evaluation"
Year(s) Of Engagement Activity 2023
 
Description Royal Institute of Mathematics maths masterclass - for secondary school pupils in Bristol 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact ~100 seondary school pupils attended our ~3 hour maths masterclass "Have you ever had a swab test for COVID-19? Do you know what the results meant?". We explained Bayes' theorem through interpretation of diagnostic test results.
Year(s) Of Engagement Activity 2022,2023