RobotReviewer: development and evaluation of a machine learning tool to speed up evidence synthesis in cardiovascular diseases
Lead Research Organisation:
King's College London
Department Name: Health and Social Care Research
Abstract
I am an academic GP, whose research background is in systematic reviewing. I previously worked as a Clinical Editor at the systematic reviews journal BMJ Clinical Evidence, and am about to submit my PhD thesis which looks at how best to communicate with patients about medical research about cardiovascular diseases (heart attack, stroke, and diseases of the large blood vessels).
I am applying for the Skills Development fellowship in Informatics due to a strong interest and aptitude in computer programming and statistics. This fellowship aims to further develop a computer system called RobotReviewer, which I have developed in collaboration with researchers in the US and the Netherlands.
RobotReviewer supports researchers producing a type research called a systematic review. Systematic reviews are articles which aim to summarise all relevant research on a particular topic. Systematic reviews are particularly helpful to clinicians, since it is not practical to keep on top of the vast amount of research published. Systematic reviews give a balanced view on the research, giving prominence to large and high quality studies, and examining whether research is at risk of bias.
Producing systematic reviews is laborious, taking a team 2 years on average. A large proportion of existing reviews are out of date, and reviews do not yet exist for many urgently needed topics. This is particularly the case in cardiovascular diseases, where reviews are typically out of date within 2 years. Technologies to speed up the production of systematic reviews are therefore urgently needed.
Data extraction is a key (and time-consuming) task in producing a systematic reviews, and the one on which this proposal focuses. Here, human reviewers start with all the research they intend to summarise (typically a large pile of paper documents), and identify detailed information including on how the trial was done and statistical results. These data are then entered into a standard template designed for the review. This task is done in duplicate to ensure high accuracy.
RobotReviewer works by taking in a large library of existing systematic reviews, together with articles describing clinical trials (in PDF format). RobotReviewer is able to take this data, and learn how to identify the key pieces of information in a clinical trial report which are needed to produce a systematic review. So far, RobotReviewer is able to help with the task of assessing whether clinical trials are at risk of bias. We compared the accuracy of RobotReviewer on this task against human researchers, and found that RobotReviewer was equally accurate at finding the text which discussed bias. Overall RobotReviewer was 70% accurate at judging whether a trial was biased, compared with humans who were 77% accurate.
This fellowship aims to develop the technology further, so that RobotReview is able to extract many other important pieces of information, including descriptions of the participants in clinical trials, the types of treatments they used, and data on what the benefits and harms of the treatments were.
After the technology has been developed, it will be tested, comparing its accuracy against the accuracy of human researchers doing the same task.
In order to get the technology as widely used as possible, the computer software will be released freely. Additionally, I am working with key people in the Cochrane Collaboration (an international charity who are the world's leading producer of systematic reviews), who are interested in piloting the use of the automation technology in their work.
As part of the fellowship I intend to undertake specialist training in statistics and computer science required for the project. Additionally, I will continue to collaborate with computer scientists in the US and the Netherlands on the project to develop my skills further.
I am applying for the Skills Development fellowship in Informatics due to a strong interest and aptitude in computer programming and statistics. This fellowship aims to further develop a computer system called RobotReviewer, which I have developed in collaboration with researchers in the US and the Netherlands.
RobotReviewer supports researchers producing a type research called a systematic review. Systematic reviews are articles which aim to summarise all relevant research on a particular topic. Systematic reviews are particularly helpful to clinicians, since it is not practical to keep on top of the vast amount of research published. Systematic reviews give a balanced view on the research, giving prominence to large and high quality studies, and examining whether research is at risk of bias.
Producing systematic reviews is laborious, taking a team 2 years on average. A large proportion of existing reviews are out of date, and reviews do not yet exist for many urgently needed topics. This is particularly the case in cardiovascular diseases, where reviews are typically out of date within 2 years. Technologies to speed up the production of systematic reviews are therefore urgently needed.
Data extraction is a key (and time-consuming) task in producing a systematic reviews, and the one on which this proposal focuses. Here, human reviewers start with all the research they intend to summarise (typically a large pile of paper documents), and identify detailed information including on how the trial was done and statistical results. These data are then entered into a standard template designed for the review. This task is done in duplicate to ensure high accuracy.
RobotReviewer works by taking in a large library of existing systematic reviews, together with articles describing clinical trials (in PDF format). RobotReviewer is able to take this data, and learn how to identify the key pieces of information in a clinical trial report which are needed to produce a systematic review. So far, RobotReviewer is able to help with the task of assessing whether clinical trials are at risk of bias. We compared the accuracy of RobotReviewer on this task against human researchers, and found that RobotReviewer was equally accurate at finding the text which discussed bias. Overall RobotReviewer was 70% accurate at judging whether a trial was biased, compared with humans who were 77% accurate.
This fellowship aims to develop the technology further, so that RobotReview is able to extract many other important pieces of information, including descriptions of the participants in clinical trials, the types of treatments they used, and data on what the benefits and harms of the treatments were.
After the technology has been developed, it will be tested, comparing its accuracy against the accuracy of human researchers doing the same task.
In order to get the technology as widely used as possible, the computer software will be released freely. Additionally, I am working with key people in the Cochrane Collaboration (an international charity who are the world's leading producer of systematic reviews), who are interested in piloting the use of the automation technology in their work.
As part of the fellowship I intend to undertake specialist training in statistics and computer science required for the project. Additionally, I will continue to collaborate with computer scientists in the US and the Netherlands on the project to develop my skills further.
Technical Summary
Background
Systematic reviews (SRs) are the bedrock of evidence-based practice. However, due to the exponential increase in available research, it is becoming increasingly difficult to keep SRs up-to-date, and keep pace with the primary literature. This problem is compounded in the rapidly moving area of cardiovascular diseases (CVDs), where most reviews are out of date within 2 years of publication.
Aims and objectives
To extend and evaluate RobotReviewer, a machine learning (ML) system to semi-automate the extraction of data from PDFs reporting clinical trials in CVDs. The system should extract information on the population, interventions, outcomes, statistics, and risks of bias. The accuracy of the system for each variable will be evaluated via a comparison with published systematic reviews. This will allow the user to determine when the system is accurate enough for use, and to what extent.
Methodology
The system will use an emerging method, distant supervision, in which a text-mining algorithm extracts pertinent text having learned to do so from existing published systematic reviews. A corpus will be developed which links the full text of clinical trials in PDF format with structured data from the Cochrane Database of Systematic Reviews (CDSR), and ClinicalTrials.gov. Methods will be developed to label the target data elements in the PDFs using data from the CDSR and clinicaltrials.gov. ML algorithms will be developed and trained using this data. The system performance will be evaluated against accuracy standards to enable a judgment about whether the system is ready for use, and in what capacity.
Scientific and medical opportunities
For those conducting systematic reviews, an accurate automation system could reduce the huge time and cost burden in conducting SRs. For clinicians and patients, increased coverage of SRs and quicker production time would ensure that health decision-making is based on the highest quality and up-to-date evidence.
Systematic reviews (SRs) are the bedrock of evidence-based practice. However, due to the exponential increase in available research, it is becoming increasingly difficult to keep SRs up-to-date, and keep pace with the primary literature. This problem is compounded in the rapidly moving area of cardiovascular diseases (CVDs), where most reviews are out of date within 2 years of publication.
Aims and objectives
To extend and evaluate RobotReviewer, a machine learning (ML) system to semi-automate the extraction of data from PDFs reporting clinical trials in CVDs. The system should extract information on the population, interventions, outcomes, statistics, and risks of bias. The accuracy of the system for each variable will be evaluated via a comparison with published systematic reviews. This will allow the user to determine when the system is accurate enough for use, and to what extent.
Methodology
The system will use an emerging method, distant supervision, in which a text-mining algorithm extracts pertinent text having learned to do so from existing published systematic reviews. A corpus will be developed which links the full text of clinical trials in PDF format with structured data from the Cochrane Database of Systematic Reviews (CDSR), and ClinicalTrials.gov. Methods will be developed to label the target data elements in the PDFs using data from the CDSR and clinicaltrials.gov. ML algorithms will be developed and trained using this data. The system performance will be evaluated against accuracy standards to enable a judgment about whether the system is ready for use, and in what capacity.
Scientific and medical opportunities
For those conducting systematic reviews, an accurate automation system could reduce the huge time and cost burden in conducting SRs. For clinicians and patients, increased coverage of SRs and quicker production time would ensure that health decision-making is based on the highest quality and up-to-date evidence.
Planned Impact
In the short-term, the major beneficiaries of the research are groups who produce evidence syntheses, who can use the tool to semi-automate systematic review production. Economically, data-extraction is a laborious task which requires highly trained (and therefore expensive) researchers to conduct. Technologies to make this process more efficient could allow UK funders to produce the same reviews for less money, or expand the scope of review topic coverage.
The Cochrane Collaboration produce close to 500 new systematic reviews per year, and with a similar number of existing reviews updated. Their current protocols rely on two researchers independently extracting data for accuracy. They have expressed enthusiasm for incorporating the technology in RobotReviewer into their work flow. We have also been approached by Dr Evidence, a US evidence synthesis firm who extract data from 7-8,000 RCTs annually. Groups such as NICE and the WHO, and other policy makers and guideline developers also extract data from large numbers of clinical trials in order to produce their guidelines. Numerous UK-based businesses such as the BMJ Group, Bazian, and Matrix expend a large amount of resources on extracting data from trial reports, and might stand to benefit from using automation technology. Use of the technology to speed up manual data-extraction in the short term is a realistic; indeed the accuracy of RobotReviewer for extracting information about risks of bias is already sufficient for this use.
In the longer term, once accuracy becomes sufficient, RobotReviewer could be used as part of a live system for clinicians and patients. If humans are no-longer needed to check the data-extractions, then they could be done instantaneously. Connecting RobotReviewer with other existing and developing technologies aimed at study identification would enable 'living' systematic reviews, and the production of evidence-based decision support systems using up-to-the-minute research.
For patients, increased availability and coverage of systematic reviews, and shortened time from protocol to publication would providing more up-to-date and comprehensive evidence to support decision making. Importantly, by use of structured data and standard ontologies, the output of the system could be used equally for multiple purposes. Such data could provide up-to-date patient information and decision aids, and ultimately help patients achieve the health outcomes which matter most to them.
Finally, the technologies this proposal aims to develop are likely to have wider applications. This proposal focuses on cardiovascular disease trials for reasons of importance, and practicality. Once applied to cardiovascular trials, the technical infrastructure would already be in place to apply to other clinical areas. The application of automation to drug trials in other areas particularly would be a comparatively small task. In the longer term, this research could provide an important step towards extracting data from other research designs.
The Cochrane Collaboration produce close to 500 new systematic reviews per year, and with a similar number of existing reviews updated. Their current protocols rely on two researchers independently extracting data for accuracy. They have expressed enthusiasm for incorporating the technology in RobotReviewer into their work flow. We have also been approached by Dr Evidence, a US evidence synthesis firm who extract data from 7-8,000 RCTs annually. Groups such as NICE and the WHO, and other policy makers and guideline developers also extract data from large numbers of clinical trials in order to produce their guidelines. Numerous UK-based businesses such as the BMJ Group, Bazian, and Matrix expend a large amount of resources on extracting data from trial reports, and might stand to benefit from using automation technology. Use of the technology to speed up manual data-extraction in the short term is a realistic; indeed the accuracy of RobotReviewer for extracting information about risks of bias is already sufficient for this use.
In the longer term, once accuracy becomes sufficient, RobotReviewer could be used as part of a live system for clinicians and patients. If humans are no-longer needed to check the data-extractions, then they could be done instantaneously. Connecting RobotReviewer with other existing and developing technologies aimed at study identification would enable 'living' systematic reviews, and the production of evidence-based decision support systems using up-to-the-minute research.
For patients, increased availability and coverage of systematic reviews, and shortened time from protocol to publication would providing more up-to-date and comprehensive evidence to support decision making. Importantly, by use of structured data and standard ontologies, the output of the system could be used equally for multiple purposes. Such data could provide up-to-date patient information and decision aids, and ultimately help patients achieve the health outcomes which matter most to them.
Finally, the technologies this proposal aims to develop are likely to have wider applications. This proposal focuses on cardiovascular disease trials for reasons of importance, and practicality. Once applied to cardiovascular trials, the technical infrastructure would already be in place to apply to other clinical areas. The application of automation to drug trials in other areas particularly would be a comparatively small task. In the longer term, this research could provide an important step towards extracting data from other research designs.
People |
ORCID iD |
Iain Marshall (Principal Investigator / Fellow) |
Publications
Jain S
(2018)
Learning Disentangled Representations of Texts with Application to Biomedical Abstracts.
in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
Jain S.
(2018)
Learning disentangled representations of texts with application to biomedical abstracts
in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Kell G
(2021)
What Would it Take to get Biomedical QA Systems into Practice?
in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
Marshall I
(2020)
State of the evidence: a survey of global disparities in clinical trials
Marshall I
(2021)
State of the evidence: a survey of global disparities in clinical trials
in BMJ Global Health
Description | Citation in Cochrane Handbook |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | https://training.cochrane.org/handbook |
Description | Use of machine learning classification of studies included in Cochrane Reviews |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Citation in clinical reviews |
Impact | I collaborated on the creation of a machine learning classifier for identifying RCTs from research databases. This has started to be used in the Cochrane Collaboration, and there are now multiple published protocols indicating the use of this tool for identifying trials to include (e.g. Agarwal S et al. Decision-support tools via mobile devices to improve quality of care in primary healthcare settings (Protocol). Cochrane Database of Systematic Reviews 2018, Issue 2. Art. No.: CD012944. DOI: 10.1002/14651858.CD012944.. The impact will be more efficient systematic review conduct and publication. |
Title | EBM-NLP dataset |
Description | We created an annotated dataset of clinical trial abstracts |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | This dataset was associated with a scientific article published at a prestigious computer science conference (ACL). |
URL | https://ebm-nlp.herokuapp.com/ |
Title | Neural network approach to text classification |
Description | An implementation of the Convolutional Neural Networks for text classification described by Yoon Kim 2016, with the addition of Bayesian hyperparameter searching. |
Type Of Material | Computer model/algorithm |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | This algorithm for text classification was key to a journal article (under review - preprint attached) for classifying research abstracts as being RCTs or not. This uses the current state of the art method for automatic text classification (Convolutional Neural Networks) |
URL | https://github.com/ijmarshall/kerastext |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4133702 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4066403 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4264156 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4285963 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3977988 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3987903 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3894317 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4183826 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3842664 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3941527 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3767068 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3826499 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3885001 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3931372 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4008988 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3831893 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4040640 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3826510 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4106104 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3997084 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4275393 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4028413 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3871414 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3826515 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3903149 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3961421 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3921774 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4055029 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3767069 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3970525 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3952100 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4081015 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3767073 |
Title | Trialstreamer data |
Description | Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3826514 |
Description | Collaboration with Byron Wallace, Northeastern |
Organisation | Northeastern University - Boston |
Country | United States |
Sector | Academic/University |
PI Contribution | I am joint PI on a project, and provide Intellectual input, data analysis, and access to data. With the research funding we employ a research associate. |
Collaborator Contribution | Dr Byron Wallace is the other joint PI on the project, and provides intellectual input, and data analysis. |
Impact | Dr Byron Wallace is a key collaborator on many parts of this research. Outputs include multiple scientific articles (listed in Publications section) and software (RobotReviewer) |
Start Year | 2015 |
Description | Collaboration with NICE |
Organisation | National Institute for Health and Care Excellence (NICE) |
Department | NICE International |
Country | United Kingdom |
Sector | Public |
PI Contribution | I have been working with NICE to incorporate the technologies from this fellowship into their guideline production process. Specifically we have been using the RCT classification system for their evidence surveillance activities. |
Collaborator Contribution | Producers of clinical guidelines |
Impact | Use of RCT classification system in clinical guideline updating. |
Start Year | 2019 |
Description | Collaboration with TRIP |
Organisation | TRIP Database |
Sector | Private |
PI Contribution | As part of getting our research used in practice; I collaborated with the TRIP database to help them get RobotReviewer running. I provided the machine learning code. |
Collaborator Contribution | Doing the technical work on the TRIP site and design to incorporate RobotReviewer model predictions; running our code across their large corpus of research abstracts. |
Impact | The TRIP database is a widely used search engine for clinicians seeking evidence-based information. RobotReviewer predictions are now incorporated into TRIP database search results. |
Start Year | 2016 |
Title | RobotReviewer |
Description | Web-based software which will implement the machine learning models which are developed over the course of the fellowship. The software is currently able to judge risks of bias, and retrieve text from full text articles which describe the Population, Interventions, and Outcomes from a trial. It also uses the deep-learning/neural network approach described in our paper (under review; pre-print also attached here) to identify study design. |
Type Of Technology | Webtool/Application |
Year Produced | 2016 |
Impact | RobotReviewer has been used by the TRIP database to 1. identify RCTs, and 2. judge risks of bias for these trials. This is presented on their search results. We are engaged with a small number of systematic review authorship groups within the Cochrane Colloboration who are piloting use of the tool. |
URL | https://github.com/ijmarshall/robotreviewer3 |
Title | RobotSearch |
Description | This software implements a machine learning approach to identifying RCTs from search results. It is aimed at information specialists/those conducting searches for systematic reviews, to be used as an alternative to a text-based filter. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | This software is linked to a journal publication which validates the approach used. Once the paper is accepted I aim to get the algorithm used more widely. |
URL | https://github.com/ijmarshall/robotsearch |
Title | RobotSearch RCT classification models |
Description | Models as described in: Marshall I, Storr AN, Kuiper J, Thomas J, Wallace BC. Machine Learning for Identifying Randomized Controlled Trials: an evaluation and practitioner's guide. Res Syn Meth. 2018. https://doi.org/10.1002/jrsm.1287 For use in RobotSearch software (https://github.com/ijmarshall/robotsearch) |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
URL | https://zenodo.org/record/1146798 |
Title | RobotSearch RCT webtool |
Description | We produced a website which allows users to easily filter studies by design |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | NICE is considering the use of this technology for their evidence syntheses. |
Description | Invited talk at NICE Information Day 2018 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I was invited to give a talk about using machine learning for identifying RCTs (which was a recent publication) at the annual NICE Information Day. This was an annual event, attended by information specialists, and included participants from NICE, Cochrane, and various national and international institutions. |
Year(s) Of Engagement Activity | 2018 |