Identifying relevant studies for systematic reviews and health technology assessments using text mining
Lead Research Organisation:
University College London
Department Name: Childhood, Families and Health
Abstract
Systematic reviews are a widely used method to bring together the findings from multiple studies in a reliable way, and are often used to inform policy and practice (such as guideline development). A critical feature of a systematic review is the application of scientific method to uncover and minimise bias and error in the selection and treatment of studies. However, the large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way both complex and time consuming.
Unfortunately, the specificity of sensitive electronic searches of bibliographic databases is low. Reviewers often need to look manually through many thousands of irrelevant titles and abstracts in order to identify the much smaller number of relevant ones; a process known as 'screening'. Given that an experienced reviewer can take between 30 seconds and several minutes to evaluate a citation, the work involved in screening 10,000 citations is considerable (and the burden of screening is sometimes considerably higher than this).
The obvious way to save time in reviews is simply to screen fewer studies. Currently, this is usually accomplished by reducing the number of citations retrieved through electronic searches by developing more specific search strategies, thereby reducing the number of irrelevant citations found. However, limiting the sensitivity of a search may undermine one of the most important principles of a systematic review: that its results are based on an unbiased set of studies.
We therefore propose to develop and evaluate an alternative approach which addresses both of these issues: it is important to have as sensitive a search as is possible, as this is necessary to obtain reliable review findings; but it is also sometimes impossible to screen the number of citations that these sensitive searches will generate. Thus, some form of automation is needed to identify the citations that do, and do not, need to be screened manually. As the data upon which the automation must work are in the form of text, we are looking to the relatively new science of text mining to provide solutions to these problems.
There are two ways of using text mining that are particularly promising for assisting with screening in systematic reviews: one aims to prioritise the list of items for manual screening so that the studies at the top of the list are those that are most likely to be relevant ('screening prioritisation'); the second method uses the manually assigned include/exclude categories of studies in order to 'learn' to apply such categorisations automatically ('automatic classification').
We know of no existing evaluations of screening prioritisation. There are a small number of other groups developing tools for automatic classification, but this project adds value by: implementing the technology in ongoing reviews; developing metrics for their use such reviews; and engaging with systematic reviewers and computer scientists with a view to building capacity for further implementation and development.
As the use of these technologies and the development of validated methods for their use are in their infancy, an important part of the project is outreach: to build interest, capacity and enthusiasm for their use in the future.
By reducing the burden of screening in reviews, new methodologies using text mining may enable systematic reviews to both: be completed more quickly (thus meeting exacting policy and practice timescales and increasing their cost efficiency); AND minimise the impact of publication bias and reduce the chances that relevant research will be missed (by enabling them to increase the sensitivity of their searches). In turn, by facilitating more timely and reliable reviews, this methodology has the potential to improve decision-making across the health sector and beyond.
Unfortunately, the specificity of sensitive electronic searches of bibliographic databases is low. Reviewers often need to look manually through many thousands of irrelevant titles and abstracts in order to identify the much smaller number of relevant ones; a process known as 'screening'. Given that an experienced reviewer can take between 30 seconds and several minutes to evaluate a citation, the work involved in screening 10,000 citations is considerable (and the burden of screening is sometimes considerably higher than this).
The obvious way to save time in reviews is simply to screen fewer studies. Currently, this is usually accomplished by reducing the number of citations retrieved through electronic searches by developing more specific search strategies, thereby reducing the number of irrelevant citations found. However, limiting the sensitivity of a search may undermine one of the most important principles of a systematic review: that its results are based on an unbiased set of studies.
We therefore propose to develop and evaluate an alternative approach which addresses both of these issues: it is important to have as sensitive a search as is possible, as this is necessary to obtain reliable review findings; but it is also sometimes impossible to screen the number of citations that these sensitive searches will generate. Thus, some form of automation is needed to identify the citations that do, and do not, need to be screened manually. As the data upon which the automation must work are in the form of text, we are looking to the relatively new science of text mining to provide solutions to these problems.
There are two ways of using text mining that are particularly promising for assisting with screening in systematic reviews: one aims to prioritise the list of items for manual screening so that the studies at the top of the list are those that are most likely to be relevant ('screening prioritisation'); the second method uses the manually assigned include/exclude categories of studies in order to 'learn' to apply such categorisations automatically ('automatic classification').
We know of no existing evaluations of screening prioritisation. There are a small number of other groups developing tools for automatic classification, but this project adds value by: implementing the technology in ongoing reviews; developing metrics for their use such reviews; and engaging with systematic reviewers and computer scientists with a view to building capacity for further implementation and development.
As the use of these technologies and the development of validated methods for their use are in their infancy, an important part of the project is outreach: to build interest, capacity and enthusiasm for their use in the future.
By reducing the burden of screening in reviews, new methodologies using text mining may enable systematic reviews to both: be completed more quickly (thus meeting exacting policy and practice timescales and increasing their cost efficiency); AND minimise the impact of publication bias and reduce the chances that relevant research will be missed (by enabling them to increase the sensitivity of their searches). In turn, by facilitating more timely and reliable reviews, this methodology has the potential to improve decision-making across the health sector and beyond.
Technical Summary
There are two main components to this study: 1) a retrospective analysis of data from existing reviews; and 2) a prospective analysis involving the use of text mining in ongoing reviews. These analyses will be used to evaluate two new methods: screening prioritisation and automatic classification. As the use of these technologies and the development of validated methods for their use are in their infancy, an important part of the project is outreach: to build interest, capacity and enthusiasm for their use in the future.
The retrospective analyses will simulate the conduct of screening using text mining utilising data from six completed EPPI-Centre reviews and subsequently from between five and eight ongoing reviews. Learning from the retrospective simulation studies will inform the parameters selected for the prospective studies and also to develop tools and metrics to evaluate their performance.
Both text mining techniques will be available for prospective evaluation in EPPI-Centre reviews over the period of this project and by arrangement with reviews conducted by external organisations too. In selected reviews, searches will be more extensive than usual and we will also maintain a record of the studies that would have been identified using standard search techniques for comparison.
In a systematic review we are less interested in predictive performance (the standard way of evaluating a classifier), but in the ability of the system - including human interaction - to identify all relevant studies as efficiently as possible. Wallace and colleagues have suggested two additional parameters, which we propose to use in addition to standard metrics: yield (the proportion of relevant studies identified) and burden (the total number manually screened). In reviews that screen everything manually, yield and burden are 100%. Successful automated approaches will reduce the burden of manual screening whilst retaining a yield of 100%.
The retrospective analyses will simulate the conduct of screening using text mining utilising data from six completed EPPI-Centre reviews and subsequently from between five and eight ongoing reviews. Learning from the retrospective simulation studies will inform the parameters selected for the prospective studies and also to develop tools and metrics to evaluate their performance.
Both text mining techniques will be available for prospective evaluation in EPPI-Centre reviews over the period of this project and by arrangement with reviews conducted by external organisations too. In selected reviews, searches will be more extensive than usual and we will also maintain a record of the studies that would have been identified using standard search techniques for comparison.
In a systematic review we are less interested in predictive performance (the standard way of evaluating a classifier), but in the ability of the system - including human interaction - to identify all relevant studies as efficiently as possible. Wallace and colleagues have suggested two additional parameters, which we propose to use in addition to standard metrics: yield (the proportion of relevant studies identified) and burden (the total number manually screened). In reviews that screen everything manually, yield and burden are 100%. Successful automated approaches will reduce the burden of manual screening whilst retaining a yield of 100%.
Planned Impact
SOCIAL AND ECONOMIC IMPACT
By reducing the burden of screening in reviews, new methodologies using text mining may enable systematic reviews to both: be completed more quickly (thus meeting exacting policy and practice timescales); AND minimise the impact of publication bias and reduce the chances that relevant research will be missed (by enabling them to increase the sensitivity of their searches). In turn, by facilitating more timely and reliable reviews, this methodology has the potential to improve decision-making across the health sector and beyond.
Thus, while the immediately apparent direct impact will be evident for researchers, as outlined in 'Academic beneficiaries', the dual benefits of increased methodological rigour and increased time- and cost-efficiency will also have direct effects for policymakers, those that commission systematic reviews and those that affected by their decisions.
The proposed research is likely to generate commercially exploitable results through the integration of the text mining tools in the software applications and the supply of support in using these technologies.
ENSURING IMPACT THROUGH DISSEMINATION
Impact will be achieved through the 'Communications plan' and the strategy outlined in detail in the 'Pathways to impact' document. The project goes beyond mere 'dissemination' and aims to engage key potential user communities in its research and evaluation process.
By reducing the burden of screening in reviews, new methodologies using text mining may enable systematic reviews to both: be completed more quickly (thus meeting exacting policy and practice timescales); AND minimise the impact of publication bias and reduce the chances that relevant research will be missed (by enabling them to increase the sensitivity of their searches). In turn, by facilitating more timely and reliable reviews, this methodology has the potential to improve decision-making across the health sector and beyond.
Thus, while the immediately apparent direct impact will be evident for researchers, as outlined in 'Academic beneficiaries', the dual benefits of increased methodological rigour and increased time- and cost-efficiency will also have direct effects for policymakers, those that commission systematic reviews and those that affected by their decisions.
The proposed research is likely to generate commercially exploitable results through the integration of the text mining tools in the software applications and the supply of support in using these technologies.
ENSURING IMPACT THROUGH DISSEMINATION
Impact will be achieved through the 'Communications plan' and the strategy outlined in detail in the 'Pathways to impact' document. The project goes beyond mere 'dissemination' and aims to engage key potential user communities in its research and evaluation process.
Organisations
Publications
Elliott J
(2014)
#CochraneTech: technology and the future of systematic reviews.
in The Cochrane database of systematic reviews
O'Connor AM
(2020)
A focus on cross-purpose tools, automated recognition of study design in multiple disciplines, and evaluation of automation tools: a summary of significant discussions at the fourth meeting of the International Collaboration for Automation of Systematic Reviews (ICASR).
in Systematic reviews
Singh G
(2017)
A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.
in Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management
O'Connor A
(2019)
A question of trust: can we build an evidence base to gain trust in systematic review automation technologies?
in Systematic Reviews
Thomas J
(2011)
Applications of text mining within systematic reviews.
in Research synthesis methods
Wyatt J
(2016)
Automated support for systematic reviews: dream or reality?
Thomas J
(2016)
Automation in systematic reviews
Thomas J
(2018)
Citation analysis may well have a role to play in study identification, but more evaluation and system development are required.
in Journal of clinical epidemiology
Noel-Storr AH
(2020)
Cochrane Centralised Search Service showed high sensitivity identifying randomized controlled trials: A retrospective analysis.
in Journal of clinical epidemiology
Schmidt L
(2020)
Data extraction methods for systematic review (semi)automation: A living review protocol.
in F1000Research
Description | Accelerating Cochrane's Child and Maternal Health 'Next Generation' Evidence System |
Amount | $1,156,829 (USD) |
Funding ID | OPP1158795 |
Organisation | Bill and Melinda Gates Foundation |
Sector | Charity/Non Profit |
Country | United States |
Start | 08/2016 |
End | 03/2017 |
Description | Cochrane Evidence Crowds & Machine Reading 2017 |
Amount | $400,000 (USD) |
Organisation | Robert Wood Johnson Foundation |
Sector | Academic/University |
Country | United States |
Start | 03/2017 |
End | 05/2018 |
Description | Methodology Research Programme |
Amount | £351,857 (GBP) |
Funding ID | MR/N015665/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2016 |
End | 03/2018 |
Description | Partnership Project grant |
Amount | $936,515 (AUD) |
Funding ID | APP1114605 |
Organisation | National Health and Medical Research Council |
Sector | Public |
Country | Australia |
Start |
Description | The Human Behaviour-Change Project: Building the science of behaviour change for complex intervention development |
Amount | £3,736,071 (GBP) |
Funding ID | 201524/Z/16/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2016 |
End | 08/2020 |
Description | Transform - Transforming Cochrane Content Production |
Amount | £533,100 (GBP) |
Organisation | The Cochrane Collaboration |
Sector | Charity/Non Profit |
Country | Global |
Start | 04/2014 |
End | 03/2015 |
Title | Priority screening |
Description | The development of a machine learning tool for prioritising the screening of records to include in several systematic reviews |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | By improving efficiency and speed of screening, the priority screening tool has meant that the reviews were able to be completed to tight timescales that were determined by Dept of Health policy needs. |
Title | Active learning in EPPI-Reviewer |
Description | EPPI-Reviewer 4 is software for all types of literature review, including systematic reviews, meta-analyses, 'narrative' reviews and meta-ethnographies. It was developed prior to this MRC grant; however, the grant has enabled new features to be added: priority screening and active learning. |
Type Of Technology | Software |
Year Produced | 2013 |
Impact | These tools have been adopted by many researchers from different organisations: used in Cochrane; machine learning now being implemented in Cochrane 'pipeline' project; NICE now evaluating and using machine learning / active learning through EPPI-Reviewer; active learning used in Cochrane review; lots of other groups using active learning in their reviews. |
URL | https://eppi.ioe.ac.uk/cms/er4/ |
Description | Cochrane Webinar 2016 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Webinar on "Getting to know EPPI-Reviewer". The webinars are open to anyone wanting to learn in the Cochrane environment, be they complete beginners or seasoned experts. |
Year(s) Of Engagement Activity | 2016 |
Description | EPPI-Centre seminar 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Other audiences |
Results and Impact | Presentation on "Can we rely on text mining to reduce screening workload in systematic reviews?" at the EPPI-Centre seminar, IoE, London |
Year(s) Of Engagement Activity | 2015 |
Description | Farr Institute 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | Presentation on "Methodological evolution (revolution?): automation in systematic reviews". Farr Institute (http://www.farrinstitute.org/), London |
Year(s) Of Engagement Activity | 2016 |
Description | IQWiG 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Presented on "Text mining for Screening" at IQWiG, Cologne, Germany https://www.iqwig.de/en/home.2724.html |
Year(s) Of Engagement Activity | 2016 |
Description | NICE 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Other audiences |
Results and Impact | Presentation to National Institute for Health and Care Excellence on "EPPI-Reviewer: software for research synthesis" |
Year(s) Of Engagement Activity | 2015 |
Description | Presentation: Can we rely on text mining to reduce screening workload in systematic reviews? |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | Presentation at the London Seminar series, 22 September 2015, London, England, UK. |
Year(s) Of Engagement Activity | 2015 |
URL | http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3317 |
Description | Seminar at Warwick Medical School |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Seminar at Warwick Medical School. "Automation in systematic reviews: what we can do now, and what we may be able to do in the future." |
Year(s) Of Engagement Activity | 2017 |
Description | Symposium on automation and systematic reviews. University of Bristol |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Presentation on "An overview of automation in systematic reviews" at Symposium on automation and systematic reviews, University of Bristol |
Year(s) Of Engagement Activity | 2015 |
Description | The potential for using technology in systematic reviews to manage the information deluge |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | This was an invited talk at an evening discussion. The following are excerpts from the email invitation from the event organisers: "[the co-hosts, Long Now Foundation] ...bring in a great mix of official Long Now foundation members, speculative designers like Superflux, tech enthusiasts & industry. These are all group that offer an eloquent and creative debate about long term changes to society. Nesta [co-hosts of the event] can hopefully add value to their discussions by constructing contexts where this foresight community meets policy, practice and analytic approaches. I would like to keep the event focused on long term changes to medicine and human health... ...Hosting this event at Nesta, we expect a crowd of up to 100. We'll likely have three or four speakers covering different themes" |
Year(s) Of Engagement Activity | 2015 |
URL | https://www.meetup.com/longnowlondon/events/226315356/ |
Description | University of Manchester 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | Presentation at University of Manchester on "Living Systematic Review" |
Year(s) Of Engagement Activity | 2015 |
Description | YHEC 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | Presentation on "EPPI-Reviewer: an overview". At York Health Economics Consortium (YHEC), York |
Year(s) Of Engagement Activity | 2016 |