BBSRC-NSF/BIO RiboViz for reliable, reproducible and rigorous quantification of protein synthesis from ribosome profiling data

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

This project is a UK-USA collaboration to develop a software suite, called RiboViz, to analyse and understand protein synthesis from "ribosome profiling" data.

All cells make proteins by using molecular machines called ribosomes, which read a messenger RNA template and "translate" the RNA code into the protein code. Cells need to make the right proteins, at the right time, in the right quantities, and so this process is carefully controlled by signals that are also encoded in the RNA. These signals are complex and only just beginning to be understood because there are thousands of different RNA sequences in a cell and each is hundreds to thousands of nucleotides ("letters") long. Recent advances in DNA & RNA sequencing technology mean that we can now measure all parts of RNA that are translated into protein and how much by using a technique called ribosome profiling. Although this technique is amazing, it is not perfect, and statistical tools are needed to separate the interesting biological signals in the data from unwanted biases of the experimental measurement. These tools need to be implemented in usable and reliable software in order for all scientists studying studying protein synthesis to be able to get the maximum possible information from ribosome profiling data, which is expensive and time-consuming to collect.

The RiboViz software suite, which is open source and free to use by anyone in the world, already takes raw data from sequencing machines and puts it through a series of processing steps. RiboViz estimates how much each part of RNA is translated, and how the amount of translation is controlled by the code of that RNA. RiboViz produces tables, figures and graphs that are accessible online, so is useful for both experts and non-experts. This kind of data sharing makes science more reproducible and more accessible.

In the course of this project, we will work with a software engineer to make the RiboViz code more reliable, easy to use, and future-proof, and to add features that quantify protein synthesis more accurately. We will develop statistical models that take account of both biological signals and unwanted biases. We will apply these to understand some interesting features of how protein synthesis is regulated. The first is how production of a a short ("upstream") protein from an RNA can control production of another protein later ("downstream") on the same RNA. The second is to understand how synonymous parts of the RNA code affect how ribosomes move and how much protein they produce.

This work will help build fundamental knowledge about how cells work, and has several applications. Companies who genetically engineer cells to express proteins, for example to make therapeutic drugs or artificial silk, will have better tools to engineer those cells to produce the right amount of protein at the right time. Scientists studying evolution will have better tools to understand how coding sequences evolve, allowing deeper understanding of the tree of life. Lastly, we will be able to better understand human genetic diseases caused by defects in protein synthesis, which in the long run could lead to better treatments.

Technical Summary

Protein synthesis, the translation of protein from messenger RNA templates by ribosomes, is a fundamental biological process that is highly regulated in all cells and consumes a large proportion of their energy, and advances in biological informatics are needed to drive biological discovery from data measuring protein synthesis. Recently, ribosome profiling and other sequencing-based methods have revolutionised our understanding of protein synthesis and genome architecture across the tree of life. However, generation of new biological knowledge from ribosome profiling data is hindered by inconsistent and underdeveloped quantitative data analysis and visualization.

We will:
1. Develop a world-leading reliable, reproducible and rigorous open-source software package for ribosome profiling quantification, by refactoring our existing RiboViz pipeline following best practices for sustainable software and adding key features, in collaboration with the UK's leading research computing facility.
2. Develop statistical tools to quantify translation of open reading frames (ORFs) on RNA, including short ORFs, by explicit regression modeling of technical biases and sequence-based features such as codon contexts.
3. Understand the effect of cis-regulatory features on translation regulation by applying RiboViz to quantify the effect of upstream ORFs on translation in three landmark species, and to quantify per-codon elongation rates and their evolution across a phylogeny of fungal species, shared with online interactive data visualizations.

Our proposed research will provide a reliable, reproducible, and rigorous suite of tools to bridge the gap in analysis of ribosome profiling data, and will generate new knowledge on the sequence features regulating protein translation. These tools will accelerate research in all groups studying translation, and enable rational design of expression of synthetic genes in biotechnology by optimising their translation rates.

Planned Impact

1) Academic Impact.
RiboViz is an open-source software pipeline for ribosome profiling analysis, so will accelerate discovery in all groups studying protein synthesis. The proposed work will focus on making RiboViz more reliable, robust, rigorous, and usable, as well as adding new features to better quantify translation. We will share online standardised ribosome profiling analyses from multiple organisms to make these accessible to all, including non-experts.
We will co-organize a 1-day training event on ribosome profiling data analysis immediately preceding the Translation UK conference in July 2020.

2) Economic impact via synthetic biology.
This research project will improve Synthetic Biology toolkits available to control protein output. The open-source tools we develop will give better quantitative predictions of how coding sequence predicts protein output in diverse organisms and conditions, enabling the design of more efficient and controllable synthetic genes. This is of clear commercial interest (see letters from Synpromics and ThermoFisher GeneArt). In the course of this grant, we will seek out industrial partnerships for future collaboration by attending the annual conferences attended by Synthetic Biology companies.

3) Societal impact via bioinformatics education.
The proposed project will provide unique research-linked impact through partnership with a large, successful bioinformatics education and public engagement project, 4273pi. Led from the University of Edinburgh's School of Biological Sciences, 4273pi is one of the largest bioinformatics-at-school projects in the world and has visited more than 50 secondary schools. The RA will contribute 5% FTE to 4273pi engagement, including teaching on schools visits, and adding new teaching material related to the proposed research on protein synthesis.

4) Wider academic impact via research computing education.
The project will also provide research-linked impact through The Carpentries, a voluntary organization that teaches foundational coding and data science skills to researchers worldwide. This project will contribute staff time to Carpentries workshops. The UK RA will assist Dr. Wallace as an instructor carpentries workshops, then train as a certified Carpentries instructor and lead a workshop in year 2. One workshop will target the BBSRC's EASTBIO doctoral training program, using teaching material on genomics related to the proposed bioinformatics research.
 
Description We have achieved a far better software package for analysis of a specialised kind of biomedical data. Our software, riboviz 2, allows robust, reliable, and reproducible analysis of "ribosome profiling" data, that measures in fine detail how proteins are made in cells. We have run workshops on how to use the software for academia and for industry. This achieves the BBSRC-funded portion of the award. The publication of the "choros" preprint on how to correct for bias in quantifying ribosome footprints achieves a large part of the NSF/BIO funded award and thus the international collaboration. We are making progress in aim 3 on quantifying the effects on translation of upstream open reading frames and other sequence features, and data analysis continues on this beyond the conclusion of the award. Collectively, the work makes it much easier to quantify how proteins are made in cells, which lays the foundations for better academic and industrial research on genetic engineering and industrial bioprocesses that involve protein production. We wrote a well-received paper on how to write better software pipelines, and another on how to combine open science
Exploitation Route Other research groups have started to use our riboviz software to study protein synthesis in diverse organisms, for a variety of research questions.
Sectors Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Title riboviz example datasets 
Description The example-datasets repository is for the configuration files and genome/annotation files needed to run the riboviz ribosome profiling pipeline on specific datasets. It aims to: - provide specific example datasets for new users to try or to adapt - share up-to-date tested example datasets between the riboviz development team 
Type Of Material Data analysis technique 
Year Produced 2021 
Provided To Others? Yes  
Impact Researchers nationally and internationally are now able to use riboviz to analyse a variety of ribosome profiling datasets from different organisms. We expect this to result in faster research progress for them, reflected in citations to the tool in future years. 
URL https://github.com/riboviz/example-datasets
 
Description BBSRC-NSF/BIO RiboViz for reliable, reproducible and rigorous quantification of protein synthesis from ribosome profiling data 
Organisation Rutgers University
Country United States 
Sector Academic/University 
PI Contribution This collaboration was jointly funded by BBSRC (UKRI) and NSF/BIO (USA). My research team led the software development for riboviz, and led writing of the papers, with intellectual contributions from all participants.
Collaborator Contribution Partners from Rutgers and U.C. Berkeley contributed thoroughly to the conceptualisation, software development, testing, data analysis, writing, and workshop delivery. Overall it was a highly successful and involved collaboration as proposed in the collaborative grant.
Impact Publications on riboviz software tool, software outputs for riboviz and research data riboviz/example-datasets, Biochemical society ribosome profiling workshop.
Start Year 2019
 
Description BBSRC-NSF/BIO RiboViz for reliable, reproducible and rigorous quantification of protein synthesis from ribosome profiling data 
Organisation University of California, Berkeley
Department Department of Bioengineering
Country United States 
Sector Academic/University 
PI Contribution This collaboration was jointly funded by BBSRC (UKRI) and NSF/BIO (USA). My research team led the software development for riboviz, and led writing of the papers, with intellectual contributions from all participants.
Collaborator Contribution Partners from Rutgers and U.C. Berkeley contributed thoroughly to the conceptualisation, software development, testing, data analysis, writing, and workshop delivery. Overall it was a highly successful and involved collaboration as proposed in the collaborative grant.
Impact Publications on riboviz software tool, software outputs for riboviz and research data riboviz/example-datasets, Biochemical society ribosome profiling workshop.
Start Year 2019
 
Title Options for RiboViz workflow management 
Description This repository contains notes related to exploring options for reimplementing the RiboViz workflow (implemented at that time as a custom Python script) using workflow technologies popular within the bioinformatics community.
This repository also contains Snakemake, Nextflow and CWL workflows rapidly prototyped during these explorations. These prototype workflows are unsupported and not designed for actual use, but may prove of interest to others as snapshots of how a subset of the RiboViz workflow was reimplemented using these technologies. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/Options_for_RiboViz_workflow_management/13147979
 
Title Options for RiboViz workflow management 
Description This repository contains notes related to exploring options for reimplementing the RiboViz workflow (implemented at that time as a custom Python script) using workflow technologies popular within the bioinformatics community.
This repository also contains Snakemake, Nextflow and CWL workflows rapidly prototyped during these explorations. These prototype workflows are unsupported and not designed for actual use, but may prove of interest to others as snapshots of how a subset of the RiboViz workflow was reimplemented using these technologies. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/Options_for_RiboViz_workflow_management/13147979/1
 
Title riboviz: software for analysis and visualization of ribosome profiling datasets 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.
We have developed this bioinformatics toolkit, **RiboViz**, for analyzing ribosome profiling datasets. **RiboViz** consists of a comprehensive and flexible analysis pipeline. The current version of **RiboViz** is designed for yeast datasets.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Title riboviz: software for analysis and visualization of ribosome profiling datasets 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.
We have developed this bioinformatics toolkit, **RiboViz**, for analyzing ribosome profiling datasets. **RiboViz** consists of a comprehensive and flexible analysis pipeline. The current version of **RiboViz** is designed for yeast datasets.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Title riboviz: software for analysis and visualization of ribosome profiling datasets 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.
We have developed this bioinformatics toolkit, **RiboViz**, for analyzing ribosome profiling datasets. **RiboViz** consists of a comprehensive and flexible analysis pipeline. The current version of **RiboViz** is designed for yeast datasets.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Title riboviz: software for analysis and visualization of ribosome profiling datasets 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.
We have developed this bioinformatics toolkit, **RiboViz**, for analyzing ribosome profiling datasets. **RiboViz** consists of a comprehensive and flexible analysis pipeline. The current version of **RiboViz** is designed for yeast datasets.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.We have developed this bioinformatics toolkit, **RiboViz**, for analyzing ribosome profiling datasets. **RiboViz** consists of a comprehensive and flexible analysis pipeline. The current version of **RiboViz** is designed for yeast datasets. 
Type Of Technology Software 
Year Produced 2021 
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Description Ribosome profiling provides a detailed global snapshot of protein synthesis in a cell. At its core, this technique makes use of the observation that a translating ribosome protects around 30 nucleotides of the mRNA from nuclease activity. High-throughput sequencing of these ribosome protected fragments (called ribosome footprints) offers a precise record of the number and location of the ribosomes at the time at which translation is stopped. Mapping the position of the ribosome protected fragments indicates the translated regions within the transcriptome. Moreover, ribosomes spend different periods of time at different positions, leading to variation in the footprint density along the mRNA transcript. This provides an estimate of how much protein is being produced from each mRNA. Importantly, ribosome profiling is as precise and detailed as RNA sequencing. Even in a short time, since its introduction in 2009, ribosome profiling has been playing a key role in driving biological discovery.We have developed this bioinformatics toolkit, riboviz, for analyzing ribosome profiling datasets. riboviz consists of a comprehensive and flexible analysis pipeline. The current version, riboviz 2, has been extensively tested on datasets from yeast, various other fungi, mouse, bacteria, and archaea. 
Type Of Technology Software 
Year Produced 2021 
URL https://figshare.com/articles/software/riboviz_software_for_analysis_and_visualization_of_ribosome_p...
 
Description Biochemical Society Workshop on Ribosome Profiling 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop on ribosome profiling
Julie Aspden & Edward Wallace, Organizers
1-2 July 2021

This Biochemical Society training event, with generous support from BBSRC, attracted 80 participants from across the world to share knowledge about ribosome profiling, the state-of-the-art method to study protein synthesis in live cells. We had a mixture of talks, discussions and hands-on sessions, which worked well for an online workshop spread over two afternoons. Based on feedback from the 2016 workshop we had concurrent sessions specifically designed for beginners and experts, as well as plenary sessions on general topics.

After an introduction to the experimental and computational aspects of ribosome profiling, all speakers shared their 'top tips' for ribosome profiling. Three consensus messages emerged: first, the experimental design must follow the biological question; second, pay attention to fundamentals such as read length, 3-nt periodicity, and ribosomal RNA contamination; third, ask for help from experts in the technique.

Hands-on data analysis sessions were presented by the teams behind a web-based tool (riboseq.org) and command-line processing pipeline (riboviz). Uwe Ohler's keynote talk on finding translated regions emphasized the importance of statistical methods and interdisciplinary collaboration. We ended with a session on the future of ribosome profiling, including talks on applications to different organisms, TCP-seq to measure translation initiation and selective ribosome profiling for nascent protein folding. Participants shared prospects for simplified experimental preparations and improved and less-biased analysis tools. The workshop had a positive collaborative atmosphere and great feedback from participants.

The riboviz-specific teaching material is available: https://github.com/riboviz/workshop-2021-07-02
Year(s) Of Engagement Activity 2021
URL https://biochemistry.org/events/workshop-on-ribosome-profiling-2021/
 
Description Online teacher CPD with 4723pi project 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact Online teacher CPD event - S3-S6 Bioinformatics for Biologists. Teachers were provided with resources packages that included a Raspberry Pi with 4273pi formatted SD card, power supply, case & monitor cables, a bioinformatics textbook and hard copies of our handouts. In this workshop, we led teachers through our bioinformatics workshops and other resources to provide them with the knowledge, confidence and equipment required to integrate bioinformatics into their classroom teaching. Feedback was collected and will be evaluated as part of a peer-reviewed publication.

This was run by the 4723pi project (https://4273pi.org/; the award-funded RA Ms Felicity Anderson participated in the training.
Year(s) Of Engagement Activity 2020
URL https://4273pi.org/
 
Description Presented invited, "riboviz experiences with Snakemake and Nextflow" talk at Genentech and Roche Data Science Tools User Group (DSTUG) virtual seminar, 2 December 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Prior Edward Wallace professional acquaintance Zia Khan read our paper in PloS Comp Bio "Using prototyping to choose a bioinformatics workflow management system" (http://dx.doi.org/10.1371/journal.pcbi.1009705) and invited us to present at the Data Science Tools User Group (DSTUG) virtual seminar. The aim of the seminar is to share new tools that might be useful for the broader data science community at Genentech and Roche.

The talk was attended by approximately 25 members of the data science community and generated interesting discussion about the capabilities of tools riboviz assessed prior to adopying one of them.
Year(s) Of Engagement Activity 2021
URL https://github.com/riboviz/dstug-workflows-talk-2021/blob/main/riboviz-workflows.pptx
 
Description Software Sustainability Institute Blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact We have presented our project "riboviz" in context with wider movements for sustainable research software on the software sustainability institute blogpost.

RiboViz: Understanding protein synthesis via analysis of ribosome profiling data - https://www.software.ac.uk/blog/2019-07-24-riboviz-understanding-protein-synthesis-analysis-ribosome-profiling-data

A quantitative biologist's journey towards teaching data skills with The Carpentries - https://www.software.ac.uk/blog/2019-08-02-quantitative-biologists-journey-towards-teaching-data-skills-carpentries

A researcher's perspective on working with the Software Sustainability Institute, for "Virtual Doors Open Day" 2020 -
https://www.software.ac.uk/blog/2020-12-09-researchers-perspective-working-software-sustainability-institute
Year(s) Of Engagement Activity 2019,2020,2021,2022
URL https://www.software.ac.uk/blog/2020-12-09-researchers-perspective-working-software-sustainability-i...
 
Description Understanding protein synthesis via analysis of ribosome profiling data 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact We described the project and its goals to the audience of the EPCC website from the perspective of the research software engineer who had just started working on the project. The article describes ribosome analysis at a high level and introduces riboviz, before highlighting the end goals.
Year(s) Of Engagement Activity 2019,2020,2021
URL https://www.epcc.ed.ac.uk/whats-happening/articles/understanding-protein-synthesis-analysis-ribosome...