HCD: Synthesis of networks of evidence on test accuracy, with and without a 'gold standard'
Lead Research Organisation:
University of Bristol
Department Name: Bristol Medical School
Abstract
A diagnostic test is any kind of medical test or assessment used to determine whether an individual does or does not have a disease or clinical condition. For most diseases there are multiple possible tests that could be used, each with different characteristics (e.g. accuracy, invasiveness to the patient, ease and speed of use, cost). Healthcare providers, laboratories and policy makers are faced with decisions about which test - or combination of tests - to use in practice for each disease.
Although there are many factors to consider in making these decisions, one key consideration is the accuracy of each test. Most diagnostic tests do not have perfect accuracy: there is almost always a chance of some false positive and/or false negative results. Clearly, other factors being equal, tests that make fewer such errors are preferred. Information on the accuracy of any given test is very often available from multiple studies, and this information is statistically combined. These 'pooled' estimates are used for decision making. For example, they are a key component of 'decision models', used by bodies such as the National Institute for Health and Care Excellence (NICE) in the UK to estimate and compare the effectiveness and cost-effectiveness of different testing strategies.
Methods for combining information from multiple studies on the accuracy of a single test are now well established. But these are inadequate for answering clinically important questions about how the accuracy of two or more tests compares and about the accuracy of tests used in combination. One of the difficulties is that different studies tend to report data of very different types: for example, Study 1 reports data on the accuracy of Tests A and B and also reports the overlap between test results on A and B; Study 2 reports data on the accuracy of A and B but doesn't report the amount of overlap; Study 3 reports on the accuracy of test A only; while Study 4 reports on tests B and C etc. A general modelling framework is needed that can analyse all such data, i.e. 'networks of evidence', together.
An additional problem is that standard methods are based on a key assumption that accuracy can be (and has been in all studies, e.g. 1-4 in the example above) estimated directly by comparing test results with results from a 'gold standard' test. This is a test that is assumed to be error-free, i.e. perfectly accurate, but not fit to be used routinely on all patients (for example, it may be highly invasive or very expensive). In practice, often either no such test for a given disease exists, or it has not been applied in all studies. As a result of this unrealistic assumption, many estimates of test accuracy - and subsequent estimates that are reliant on these, e.g. of effectiveness and cost-effectiveness - could be completely wrong. However, careful modelling of networks of evidence will offer a route to relaxing this assumption, through a type of more advanced statistical modelling called 'latent class models'. Through modelling of the overlap between results on multiple tests applied to the same individuals, latent class models are able to provide the required estimates of test accuracy without any direct classification of each individual as diseased/disease-free.
In this program of research we will develop a general statistical modelling framework to model networks of evidence on test accuracy, that will be applicable across wide ranging clinical areas. The approach will deliver more reliable estimates of the accuracy and comparative accuracy of tests or combinations of tests - ultimately leading to improved decisions about use of tests in practice. We will provide training and resources to support use of the methods developed.
Although there are many factors to consider in making these decisions, one key consideration is the accuracy of each test. Most diagnostic tests do not have perfect accuracy: there is almost always a chance of some false positive and/or false negative results. Clearly, other factors being equal, tests that make fewer such errors are preferred. Information on the accuracy of any given test is very often available from multiple studies, and this information is statistically combined. These 'pooled' estimates are used for decision making. For example, they are a key component of 'decision models', used by bodies such as the National Institute for Health and Care Excellence (NICE) in the UK to estimate and compare the effectiveness and cost-effectiveness of different testing strategies.
Methods for combining information from multiple studies on the accuracy of a single test are now well established. But these are inadequate for answering clinically important questions about how the accuracy of two or more tests compares and about the accuracy of tests used in combination. One of the difficulties is that different studies tend to report data of very different types: for example, Study 1 reports data on the accuracy of Tests A and B and also reports the overlap between test results on A and B; Study 2 reports data on the accuracy of A and B but doesn't report the amount of overlap; Study 3 reports on the accuracy of test A only; while Study 4 reports on tests B and C etc. A general modelling framework is needed that can analyse all such data, i.e. 'networks of evidence', together.
An additional problem is that standard methods are based on a key assumption that accuracy can be (and has been in all studies, e.g. 1-4 in the example above) estimated directly by comparing test results with results from a 'gold standard' test. This is a test that is assumed to be error-free, i.e. perfectly accurate, but not fit to be used routinely on all patients (for example, it may be highly invasive or very expensive). In practice, often either no such test for a given disease exists, or it has not been applied in all studies. As a result of this unrealistic assumption, many estimates of test accuracy - and subsequent estimates that are reliant on these, e.g. of effectiveness and cost-effectiveness - could be completely wrong. However, careful modelling of networks of evidence will offer a route to relaxing this assumption, through a type of more advanced statistical modelling called 'latent class models'. Through modelling of the overlap between results on multiple tests applied to the same individuals, latent class models are able to provide the required estimates of test accuracy without any direct classification of each individual as diseased/disease-free.
In this program of research we will develop a general statistical modelling framework to model networks of evidence on test accuracy, that will be applicable across wide ranging clinical areas. The approach will deliver more reliable estimates of the accuracy and comparative accuracy of tests or combinations of tests - ultimately leading to improved decisions about use of tests in practice. We will provide training and resources to support use of the methods developed.
Technical Summary
Overall objective: We will develop general methodology to synthesise networks of evidence on the sensitivity and specificity of tests for any given disease. The approach will be applicable across clinical areas and will simultaneously produce pooled estimates of singular, joint and comparative accuracy, allowing for imperfections in 'reference standard' tests where appropriate. The framework will accommodate data of varied types, e.g. studies reporting results on (i) Test A vs a Gold Standard (GS) test; (ii) A vs B vs GS; (iii) A vs B without a GS; (iv) A vs GS and B vs GS reported separately; (v) B vs C vs GS etc. We will demonstrate how the assumption that any one of the tests in the network is a GS can be relaxed.
Why are these estimates needed? Estimates of sensitivity and specificity, informed by systematic reviews and meta-analyses, are crucial for decision-making about which test and/or combination of tests to use for any given disease. For example, these are key parameters in decision models used by NICE to compare the effectiveness and cost-effectiveness of testing strategies.
Why are new methods needed? Standard methods (i) cannot produce estimates of the accuracy of tests used in combination, accounting for likely dependencies between them; (ii) produce unnecessarily imprecise estimates of comparative accuracy, (iii) are critically reliant on the assumption that a GS test exists and has been applied in all studies. Proposed advanced methods are not yet fit for general use: for example, involving estimation of infeasible numbers of parameters.
Methods: We will work within a Bayesian multi-parameter evidence synthesis framework, allowing data of varied types to be synthesised together through specification of the relationships between parameters. We will draw on - and extend - existing methods for meta-analysis of comparative test accuracy, and latent class models that have been proposed for estimation of accuracy in the absence of a GS.
Why are these estimates needed? Estimates of sensitivity and specificity, informed by systematic reviews and meta-analyses, are crucial for decision-making about which test and/or combination of tests to use for any given disease. For example, these are key parameters in decision models used by NICE to compare the effectiveness and cost-effectiveness of testing strategies.
Why are new methods needed? Standard methods (i) cannot produce estimates of the accuracy of tests used in combination, accounting for likely dependencies between them; (ii) produce unnecessarily imprecise estimates of comparative accuracy, (iii) are critically reliant on the assumption that a GS test exists and has been applied in all studies. Proposed advanced methods are not yet fit for general use: for example, involving estimation of infeasible numbers of parameters.
Methods: We will work within a Bayesian multi-parameter evidence synthesis framework, allowing data of varied types to be synthesised together through specification of the relationships between parameters. We will draw on - and extend - existing methods for meta-analysis of comparative test accuracy, and latent class models that have been proposed for estimation of accuracy in the absence of a GS.
Planned Impact
We will produce a general modelling framework that can be used by analysts worldwide to produce more reliable (unbiased and often likely more precise) estimates of singular, comparative and joint test accuracy.
Guideline developers: A major benefit will be to the National Institute for Health and Care Excellence (NICE) in the UK and to other guideline developers worldwide. Test accuracy parameters are key parameters in decision models used, for example, by the NICE Diagnostic Assessment Program and to inform NICE Guidelines. Models developed will be applicable across a wide range of clinical areas. Their use will facilitate the decision-making process about testing strategies, and ultimately lead to more robust decisions being made.
Laboratories: Laboratories, such as Public Health England laboratories, and healthcare providers are also directly faced with choices between multiple tests. Although they rarely perform systematic reviews of test accuracy themselves, they will benefit from improved evidence on comparative test accuracy produced by other teams using our methods.
Teams undertaking systematic reviews: Any team undertaking a systematic review of test accuracy (with or without an accompanying decision model) will benefit from the improved methods. This includes teams undertaking Cochrane reviews and reviews undertaken as part of the NIHR Health Technology Assessment (HTA) programme, and other reviews conducted by teams of academics. Manufacturers of tests (e.g. Roche) also sometimes undertake their own systematic reviews of test accuracy. In addition to improved statistical methodology for evidence synthesis (such that results from reviews will be more robust and more clinically relevant), a key objective of our proposal is to produce guidance for reviewers on (i) what types of evidence it is worthwhile 'finding', (ii) what types of data it is worthwhile extracting. This will make the systematic review process more efficient.
Healthcare providers and patients: Ultimately, the benefit will be to the NHS (and other healthcare providers), clinicians and patients. Improved evidence on testing strategies will lead to improvements in diagnostic pathways, including more cost-effective use of resources. Improved estimates of test accuracy directly lead to (through application of Bayes' rule) improved estimates of the more clinically relevant positive and negative predictive values, such that clinicians will be better informed when making their decisions about the next step for any given individual. The benefit to patients of improved diagnostic testing strategies is of course high, including potentially faster diagnosis and subsequent treatment, and a reduced chance of 'false alarms' and the anxiety that this inevitably induces.
Guideline developers: A major benefit will be to the National Institute for Health and Care Excellence (NICE) in the UK and to other guideline developers worldwide. Test accuracy parameters are key parameters in decision models used, for example, by the NICE Diagnostic Assessment Program and to inform NICE Guidelines. Models developed will be applicable across a wide range of clinical areas. Their use will facilitate the decision-making process about testing strategies, and ultimately lead to more robust decisions being made.
Laboratories: Laboratories, such as Public Health England laboratories, and healthcare providers are also directly faced with choices between multiple tests. Although they rarely perform systematic reviews of test accuracy themselves, they will benefit from improved evidence on comparative test accuracy produced by other teams using our methods.
Teams undertaking systematic reviews: Any team undertaking a systematic review of test accuracy (with or without an accompanying decision model) will benefit from the improved methods. This includes teams undertaking Cochrane reviews and reviews undertaken as part of the NIHR Health Technology Assessment (HTA) programme, and other reviews conducted by teams of academics. Manufacturers of tests (e.g. Roche) also sometimes undertake their own systematic reviews of test accuracy. In addition to improved statistical methodology for evidence synthesis (such that results from reviews will be more robust and more clinically relevant), a key objective of our proposal is to produce guidance for reviewers on (i) what types of evidence it is worthwhile 'finding', (ii) what types of data it is worthwhile extracting. This will make the systematic review process more efficient.
Healthcare providers and patients: Ultimately, the benefit will be to the NHS (and other healthcare providers), clinicians and patients. Improved evidence on testing strategies will lead to improvements in diagnostic pathways, including more cost-effective use of resources. Improved estimates of test accuracy directly lead to (through application of Bayes' rule) improved estimates of the more clinically relevant positive and negative predictive values, such that clinicians will be better informed when making their decisions about the next step for any given individual. The benefit to patients of improved diagnostic testing strategies is of course high, including potentially faster diagnosis and subsequent treatment, and a reduced chance of 'false alarms' and the anxiety that this inevitably induces.
People |
ORCID iD |
Hayley Jones (Principal Investigator) |
Publications
Andrews LJ
(2022)
Prevalence of BRAFV600 in glioma and use of BRAF Inhibitors in patients with BRAFV600 mutation-positive glioma: systematic review.
in Neuro-oncology
Brandner S
(2022)
Diagnostic accuracy of 1p/19q codeletion tests in oligodendroglioma: A comprehensive meta-analysis based on a Cochrane systematic review.
in Neuropathology and applied neurobiology
Cerullo E
(2022)
Meta-analysis of dichotomous and ordinal tests with an imperfect gold standard.
in Research synthesis methods
Cerullo E
(2023)
MetaBayesDTA: codeless Bayesian meta-analysis of test accuracy, with or without a gold standard.
in BMC medical research methodology
Elwenspoek MMC
(2021)
The accuracy of diagnostic indicators for coeliac disease: A systematic review and meta-analysis.
in PloS one
McAleenan A
(2022)
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.
in The Cochrane database of systematic reviews
Sheppard AL
(2022)
Systematic review with meta-analysis: the accuracy of serological tests to support the diagnosis of coeliac disease.
in Alimentary pharmacology & therapeutics
Takwoingi Y
(2023)
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy
Description | The benefits, harms and costs of surveillance for hepatocellular carcinoma in people with cirrhosis: synthesis of observational and diagnostic test accuracy data and cost-utility analysis |
Amount | £337,568 (GBP) |
Funding ID | NIHR134670 |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 07/2022 |
End | 03/2024 |
Description | 'Introduction to Diagnostic Test Accuracy reviews' training course for PenTAG, University of Exeter |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | ~15 academic researchers from PenTAG, the Technology Assessment Group at the University of Exeter, attended our online training course (over 2 half days), on how to complete systematic reviews of diagnostic test accuracy |
Year(s) Of Engagement Activity | 2022 |
Description | Invited seminar at University of Edinburgh |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | I was invited to deliver a 1 hour seminar at the maths department at the University of Edinburgh. Title "Bayesian evidence synthesis models for prevalence estimation and diagnostic test evaluation" |
Year(s) Of Engagement Activity | 2023 |
Description | Royal Institute of Mathematics maths masterclass - for secondary school pupils in Bristol |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | ~100 seondary school pupils attended our ~3 hour maths masterclass "Have you ever had a swab test for COVID-19? Do you know what the results meant?". We explained Bayes' theorem through interpretation of diagnostic test results. |
Year(s) Of Engagement Activity | 2022,2023 |