RobotReviewer: development and evaluation of a machine learning tool to speed up evidence synthesis in cardiovascular diseases

Abstract

I am an academic GP, whose research background is in systematic reviewing. I previously worked as a Clinical Editor at the systematic reviews journal BMJ Clinical Evidence, and am about to submit my PhD thesis which looks at how best to communicate with patients about medical research about cardiovascular diseases (heart attack, stroke, and diseases of the large blood vessels).
I am applying for the Skills Development fellowship in Informatics due to a strong interest and aptitude in computer programming and statistics. This fellowship aims to further develop a computer system called RobotReviewer, which I have developed in collaboration with researchers in the US and the Netherlands.

RobotReviewer supports researchers producing a type research called a systematic review. Systematic reviews are articles which aim to summarise all relevant research on a particular topic. Systematic reviews are particularly helpful to clinicians, since it is not practical to keep on top of the vast amount of research published. Systematic reviews give a balanced view on the research, giving prominence to large and high quality studies, and examining whether research is at risk of bias.

Producing systematic reviews is laborious, taking a team 2 years on average. A large proportion of existing reviews are out of date, and reviews do not yet exist for many urgently needed topics. This is particularly the case in cardiovascular diseases, where reviews are typically out of date within 2 years. Technologies to speed up the production of systematic reviews are therefore urgently needed.

Data extraction is a key (and time-consuming) task in producing a systematic reviews, and the one on which this proposal focuses. Here, human reviewers start with all the research they intend to summarise (typically a large pile of paper documents), and identify detailed information including on how the trial was done and statistical results. These data are then entered into a standard template designed for the review. This task is done in duplicate to ensure high accuracy.

RobotReviewer works by taking in a large library of existing systematic reviews, together with articles describing clinical trials (in PDF format). RobotReviewer is able to take this data, and learn how to identify the key pieces of information in a clinical trial report which are needed to produce a systematic review. So far, RobotReviewer is able to help with the task of assessing whether clinical trials are at risk of bias. We compared the accuracy of RobotReviewer on this task against human researchers, and found that RobotReviewer was equally accurate at finding the text which discussed bias. Overall RobotReviewer was 70% accurate at judging whether a trial was biased, compared with humans who were 77% accurate.

This fellowship aims to develop the technology further, so that RobotReview is able to extract many other important pieces of information, including descriptions of the participants in clinical trials, the types of treatments they used, and data on what the benefits and harms of the treatments were.
After the technology has been developed, it will be tested, comparing its accuracy against the accuracy of human researchers doing the same task.

In order to get the technology as widely used as possible, the computer software will be released freely. Additionally, I am working with key people in the Cochrane Collaboration (an international charity who are the world's leading producer of systematic reviews), who are interested in piloting the use of the automation technology in their work.

As part of the fellowship I intend to undertake specialist training in statistics and computer science required for the project. Additionally, I will continue to collaborate with computer scientists in the US and the Netherlands on the project to develop my skills further.

Technical Summary

Background
Systematic reviews (SRs) are the bedrock of evidence-based practice. However, due to the exponential increase in available research, it is becoming increasingly difficult to keep SRs up-to-date, and keep pace with the primary literature. This problem is compounded in the rapidly moving area of cardiovascular diseases (CVDs), where most reviews are out of date within 2 years of publication.

Aims and objectives
To extend and evaluate RobotReviewer, a machine learning (ML) system to semi-automate the extraction of data from PDFs reporting clinical trials in CVDs. The system should extract information on the population, interventions, outcomes, statistics, and risks of bias. The accuracy of the system for each variable will be evaluated via a comparison with published systematic reviews. This will allow the user to determine when the system is accurate enough for use, and to what extent.

Methodology
The system will use an emerging method, distant supervision, in which a text-mining algorithm extracts pertinent text having learned to do so from existing published systematic reviews. A corpus will be developed which links the full text of clinical trials in PDF format with structured data from the Cochrane Database of Systematic Reviews (CDSR), and ClinicalTrials.gov. Methods will be developed to label the target data elements in the PDFs using data from the CDSR and clinicaltrials.gov. ML algorithms will be developed and trained using this data. The system performance will be evaluated against accuracy standards to enable a judgment about whether the system is ready for use, and in what capacity.

Scientific and medical opportunities
For those conducting systematic reviews, an accurate automation system could reduce the huge time and cost burden in conducting SRs. For clinicians and patients, increased coverage of SRs and quicker production time would ensure that health decision-making is based on the highest quality and up-to-date evidence.

Planned Impact

In the short-term, the major beneficiaries of the research are groups who produce evidence syntheses, who can use the tool to semi-automate systematic review production. Economically, data-extraction is a laborious task which requires highly trained (and therefore expensive) researchers to conduct. Technologies to make this process more efficient could allow UK funders to produce the same reviews for less money, or expand the scope of review topic coverage.

The Cochrane Collaboration produce close to 500 new systematic reviews per year, and with a similar number of existing reviews updated. Their current protocols rely on two researchers independently extracting data for accuracy. They have expressed enthusiasm for incorporating the technology in RobotReviewer into their work flow. We have also been approached by Dr Evidence, a US evidence synthesis firm who extract data from 7-8,000 RCTs annually. Groups such as NICE and the WHO, and other policy makers and guideline developers also extract data from large numbers of clinical trials in order to produce their guidelines. Numerous UK-based businesses such as the BMJ Group, Bazian, and Matrix expend a large amount of resources on extracting data from trial reports, and might stand to benefit from using automation technology. Use of the technology to speed up manual data-extraction in the short term is a realistic; indeed the accuracy of RobotReviewer for extracting information about risks of bias is already sufficient for this use.

In the longer term, once accuracy becomes sufficient, RobotReviewer could be used as part of a live system for clinicians and patients. If humans are no-longer needed to check the data-extractions, then they could be done instantaneously. Connecting RobotReviewer with other existing and developing technologies aimed at study identification would enable 'living' systematic reviews, and the production of evidence-based decision support systems using up-to-the-minute research.

For patients, increased availability and coverage of systematic reviews, and shortened time from protocol to publication would providing more up-to-date and comprehensive evidence to support decision making. Importantly, by use of structured data and standard ontologies, the output of the system could be used equally for multiple purposes. Such data could provide up-to-date patient information and decision aids, and ultimately help patients achieve the health outcomes which matter most to them.

Finally, the technologies this proposal aims to develop are likely to have wider applications. This proposal focuses on cardiovascular disease trials for reasons of importance, and practicality. Once applied to cardiovascular trials, the technical infrastructure would already be in place to apply to other clinical areas. The application of automation to drug trials in other areas particularly would be a comparatively small task. In the longer term, this research could provide an important step towards extracting data from other research designs.

Funded Value:

£330,011

Funded Period:

Jun 16 - Jun 21

Funder:

MRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

MR/N015185/1

Principal Investigator:

Iain Marshall

Health Category:

Unclassified

Organisations

People	ORCID iD
Iain Marshall (Principal Investigator / Fellow)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 > >|

10 25 50

Emmett ES (2019) A comparison of trends in stroke care and outcomes between in-hospital and community-onset stroke - The South London Stroke Register. in PloS one

Harries T (2020) MOESM1 of Blood eosinophil count, a marker of inhaled corticosteroid effectiveness in preventing COPD exacerbations in post-hoc RCT and observational studies: systematic review and meta-analysis

Harries TH (2020) Blood eosinophil count, a marker of inhaled corticosteroid effectiveness in preventing COPD exacerbations in post-hoc RCT and observational studies: systematic review and meta-analysis. in Respiratory research

Jain S (2018) Learning Disentangled Representations of Texts with Application to Biomedical Abstracts. in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Jain S. (2018) Learning disentangled representations of texts with application to biomedical abstracts in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018

Jain V (2017) Trends in the prevalence and management of pre-stroke atrial fibrillation, the South London Stroke Register, 1995-2014. in PloS one

Kell G (2021) What Would it Take to get Biomedical QA Systems into Practice? in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Lim E (2017) A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation

Marshall I (2019) Rapid reviews may produce different results to systematic reviews: a meta-epidemiological study in Journal of Clinical Epidemiology

Marshall I (2020) Trialstreamer: a living, automatically updated database of clinical trial reports

Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	Citation in Cochrane Handbook
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers
URL	https://training.cochrane.org/handbook


Description	Use of machine learning classification of studies included in Cochrane Reviews
Geographic Reach	Multiple continents/international
Policy Influence Type	Citation in clinical reviews
Impact	I collaborated on the creation of a machine learning classifier for identifying RCTs from research databases. This has started to be used in the Cochrane Collaboration, and there are now multiple published protocols indicating the use of this tool for identifying trials to include (e.g. Agarwal S et al. Decision-support tools via mobile devices to improve quality of care in primary healthcare settings (Protocol). Cochrane Database of Systematic Reviews 2018, Issue 2. Art. No.: CD012944. DOI: 10.1002/14651858.CD012944.. The impact will be more efficient systematic review conduct and publication.


Description	SOLACE-AI: Synthesis of Online Literature for Adaptation to Climate-Change Emergencies
Amount	£4,100,000 (GBP)
Funding ID	SOLACE-AI: Synthesis of Online Literature for Adaptation to Climate-Change Emergencies
Organisation	Wellcome Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	03/2025
End	02/2030


Title	EBM-NLP dataset
Description	We created an annotated dataset of clinical trial abstracts
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	This dataset was associated with a scientific article published at a prestigious computer science conference (ACL).
URL	https://ebm-nlp.herokuapp.com/


Title	Neural network approach to text classification
Description	An implementation of the Convolutional Neural Networks for text classification described by Yoon Kim 2016, with the addition of Bayesian hyperparameter searching.
Type Of Material	Computer model/algorithm
Year Produced	2017
Provided To Others?	Yes
Impact	This algorithm for text classification was key to a journal article (under review - preprint attached) for classifying research abstracts as being RCTs or not. This uses the current state of the art method for automatic text classification (Convolutional Neural Networks)
URL	https://github.com/ijmarshall/kerastext


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4604584


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4881619


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4767112


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5336305


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4309221


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3931372


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4320522


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3894317


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6200218


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5137142


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5205493


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6302956


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5560973


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3767069


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4680355


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5718019


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5235214


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5517061


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6669532


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6531196


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4588422


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5081941


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6637538


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5036223


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5760897


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4040640


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4008988


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5502943


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4626756


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4431749


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6351600


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5112911


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3970525


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4415341


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6411260


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5005239


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3826510


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5171849


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5702451


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4700212


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5547233


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5734208


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3842664


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6552813


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6592618


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4395554


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5775937


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3961421


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4555095


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4943580


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4518716


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3977988


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4106104


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6616275


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4662535


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4541112


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4446614


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5575108


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4081015


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3952100


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4745455


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3767073


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4285963


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5653236


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6444941


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4569479


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6389668


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5069691


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5595898


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6482567


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3997084


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3826515


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4905581


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5464673


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4264156


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6511035


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3941527


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6372493


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4066403


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4275393


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6572583


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3831893


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5636037


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/5153084


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3885001


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4028413


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4719378


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3767068


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4297497


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3871414


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4055029


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3826499


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3921774


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3903149


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/6334123


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4133702


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/4183826


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3987903


Title	Trialstreamer data
Description	Trialstreamer annotated collection of RCTs. This respository contains baseline files (large), and subsequent updates (daily for PubMed, weekly for ICTRP).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://zenodo.org/record/3826514


Description	Collaboration with Byron Wallace, Northeastern
Organisation	Northeastern University - Boston
Country	United States
Sector	Academic/University
PI Contribution	I am joint PI on a project, and provide Intellectual input, data analysis, and access to data. With the research funding we employ a research associate.
Collaborator Contribution	Dr Byron Wallace is the other joint PI on the project, and provides intellectual input, and data analysis.
Impact	Dr Byron Wallace is a key collaborator on many parts of this research. Outputs include multiple scientific articles (listed in Publications section) and software (RobotReviewer)
Start Year	2015


Description	Collaboration with NICE
Organisation	National Institute for Health and Care Excellence (NICE)
Department	NICE International
Country	United Kingdom
Sector	Public
PI Contribution	I have been working with NICE to incorporate the technologies from this fellowship into their guideline production process. Specifically we have been using the RCT classification system for their evidence surveillance activities.
Collaborator Contribution	Producers of clinical guidelines
Impact	Use of RCT classification system in clinical guideline updating.
Start Year	2019


Description	Collaboration with TRIP
Organisation	TRIP Database
Sector	Private
PI Contribution	As part of getting our research used in practice; I collaborated with the TRIP database to help them get RobotReviewer running. I provided the machine learning code.
Collaborator Contribution	Doing the technical work on the TRIP site and design to incorporate RobotReviewer model predictions; running our code across their large corpus of research abstracts.
Impact	The TRIP database is a widely used search engine for clinicians seeking evidence-based information. RobotReviewer predictions are now incorporated into TRIP database search results.
Start Year	2016


Title	RobotReviewer
Description	Web-based software which will implement the machine learning models which are developed over the course of the fellowship. The software is currently able to judge risks of bias, and retrieve text from full text articles which describe the Population, Interventions, and Outcomes from a trial. It also uses the deep-learning/neural network approach described in our paper (under review; pre-print also attached here) to identify study design.
Type Of Technology	Webtool/Application
Year Produced	2016
Impact	RobotReviewer has been used by the TRIP database to 1. identify RCTs, and 2. judge risks of bias for these trials. This is presented on their search results. We are engaged with a small number of systematic review authorship groups within the Cochrane Colloboration who are piloting use of the tool.
URL	https://github.com/ijmarshall/robotreviewer3


Title	RobotSearch
Description	This software implements a machine learning approach to identifying RCTs from search results. It is aimed at information specialists/those conducting searches for systematic reviews, to be used as an alternative to a text-based filter.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	This software is linked to a journal publication which validates the approach used. Once the paper is accepted I aim to get the algorithm used more widely.
URL	https://github.com/ijmarshall/robotsearch


Title	RobotSearch RCT classification models
Description	Models as described in: Marshall I, Storr AN, Kuiper J, Thomas J, Wallace BC. Machine Learning for Identifying Randomized Controlled Trials: an evaluation and practitioner's guide. Res Syn Meth. 2018. https://doi.org/10.1002/jrsm.1287 For use in RobotSearch software (https://github.com/ijmarshall/robotsearch)
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
URL	https://zenodo.org/record/1146798


Title	RobotSearch RCT classification models
Description	Models as described in: Marshall I, Storr AN, Kuiper J, Thomas J, Wallace BC. Machine Learning for Identifying Randomized Controlled Trials: an evaluation and practitioner's guide. Res Syn Meth. 2018. https://doi.org/10.1002/jrsm.1287 For use in RobotSearch software (https://github.com/ijmarshall/robotsearch)
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
URL	https://zenodo.org/record/1146799


Title	RobotSearch RCT webtool
Description	We produced a website which allows users to easily filter studies by design
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	NICE is considering the use of this technology for their evidence syntheses.


Description	Invited talk at NICE Information Day 2018
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I was invited to give a talk about using machine learning for identifying RCTs (which was a recent publication) at the annual NICE Information Day. This was an annual event, attended by information specialists, and included participants from NICE, Cochrane, and various national and international institutions.
Year(s) Of Engagement Activity	2018