Living figures: reproducing published experiments in situ

Lead Research Organisation: University of East Anglia
Department Name: Graduate Office

Abstract

Speaking about the issues thrown up by 'Big Data', The Economist reported in the article "Welcome to the Yotta World" that, by 2018, there would be a talent gap of ~150,000 data-science professionals globally. The problem is particularly acute in high-throughput biology. Here, complex environment- and software set-ups are required to support down-stream data analyses. As a result, scientists often find it impossible to repeat published experiments faithfully. New approaches are therefore required to forge better links between published articles and their underlying data, to help researchers visualise and reproduce the results described in the papers they read.

To make progress in this area, an exciting collaboration has recently been established between The Genome Analysis Centre (TGAC, Norwich) - a leading genome and bioinformatics research facility in Europe - the eLife journal (Cambridge) and the University of Manchester. The project will exploit the principal technologies developed by partners: the BioJavaScript open-source library for visualisation of biological components; the Utopia Documents 'smart PDF reader'; and the publishing platform of the innovative, open-access journal eLife.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M017176/1 01/10/2015 17/09/2021
1653815 Studentship BB/M017176/1 01/10/2015 06/12/2021 Evanthia Samota
 
Description Science reproducibility is a very complex issue. It involves restrains in the availability of technical tools, as well as cultural issues. Cultural issues such as attitudes towards reproducibility can be the final point in enabling reproducibility, because no matter how many tools are developed in enabling it and facilitating it, if researchers are not incentivised, trained adequately or simply care to perform reproducible science then this issue will never be fully resolved.

Developed a reproducible figure, with interactive elements, availability of data, code, import-export facilities etc.

Have been pivotal in eLIFE's decisions, as well providing them with my survey findings, which shaped the production of the first computationally reproducible document.

Have tested different platforms, and frameworks, when making my interactive figure, and improved my computational skills tremendously.

Lack of standardisation in ontology reporting, both in papers and databases. Demonstrated with wheat, barley and rice transcriptomic papers, how the way the data and metadata published in papers and databases is not shared in a robust and reproducible way that allows readers to be able to understand, re-use and reproduce the findings and data.

In the project, we propose the use of software that automates finding plant and crop ontology descriptions and their standardised ontology terms as well as fetching the data automatically from the databases. It also scores with reproducibility metrics how well the authors described and presented the data, metadata and software and whether they are easily available and correctly described in databases and public repositories.
Exploitation Route Publishers can learn from my findings, as I have identified also a lot of the limitations that can be impeding in the correct implementation and acceptance as well as proper use of interactive documents and figures by authors and readers.

With regards to our findings with the survey:

This is a broad survey that details the opinions and insights of researchers and paper authors into the state of reproducibility of experiments. The most pivotal factor identified that affects successful research reproducibility is access to detailed methodology description in papers. Our research demonstrates that interactive elements within journals which can reproduce computational experiments such as running workflows and statistical analyses can be a desirable solution to reproducibility issues, which to our current knowledge has not been explored in any other survey study before. Our study can drive research into sorely needed reproducibility tools, where developers must focus on the rationale of their tools for targeting communities, especially those not computationally trained. Our manuscript suggests a paradigm for future studies on tools addressing reproducibility via interactive figures and policies including hiring practices, that would endorse and encourage reproducible science. Our results corroborate the need to address the social barriers to reproducibility, such as lack of incentives, rewards, inadequate training in reproducible research towards complying with correct reproducibility practices. Our findings have a direct impact and wide interest, as we discuss barriers and solutions to reproducibility issues in the life sciences that will be of interest to paper readers, reproducibility and open data policymakers.


With regards to the ontology project:
In the project, we propose the use of software that automates finding plant and crop ontology descriptions and their standardised ontology terms as well as fetching the data automatically from the databases. It also scores with reproducibility metrics how well the authors described and presented the data, metadata and software and whether they are easily available and correctly described in databases and public repositories. This allows others to be able to read, understand and reproduce published experiments in the field of wheat, barley and rice transcriptomics more reliably.
Sectors Digital/Communication/Information Technologies (including Software),Other

 
Description Helping eLIFE shaping their interactive computational reproducible document and other related products around enabling reproducibility within publications.
First Year Of Impact 2015
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Cultural,Societal,Economic

 
Title Distributed a survey to canvas opinions of researchers/article authors on their understanding of irreproducibility as well as assess perceived benefit in reproducibility enabled with interactive elements within publications. 
Description Distributed a survey to canvas opinions of researchers/article authors on their understanding of irreproducibility as well as assess perceived benefit in reproducibility enabled with interactive elements within publications. We are very close to be publishing these outcomes, with very interesting findings. Our survey has been constructed with very interesting questions, were other researchers involved in the topic of reproducibility, especially related to publications can benefit from and use similar questions to get further insights or collect their own informaiton. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Impacts: helped significantly eLIFE in shaping their first computationally reproducible document We will also be submitting our paper in BioRxiv and soon submit to an appropriate journal. 
 
Description BOSC conference 2017 poster presentation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Presented my project's initial outcomes with the interactive figure.

Engaged in the reproducibility focus group/bird discussions of eLIFE at the conference and discussed with other researchers in the topic (from various institutions) the topic of reproducibility.
Year(s) Of Engagement Activity 2017