A SPATIO-TEMPORAL MAP OF THE DEVELOPMENTAL FLY INTERACTOME

Lead Research Organisation: University of Manchester
Department Name: School of Biological Sciences

Abstract

Development, the process by which cells differentiate and divide to create new life is a fascinating process that is governed by the complex interplay between our genes. Careful control of these genes, and more specifically their protein products, by altering their levels and specific nature over time dictates the fate of cells and what tissues they will form. As well as the timing, the location within the cell where a gene is expressed and its protein product is active is also important in determining function. The information needed to solve this puzzle is, in principle, contained within the genome sequence. However, we currently lack the full picture of what happens during the course of development for several reasons: we don't know how much of each gene is expressed at each time point, we don't know which version (isoform) of each gene is expressed, and we don't know which other partner genes each gene interacts with nor where in the cell this happens. Although some of this information is known, much of the relevant knowledge needed to properly understand developmental signalling is missing. Crucially, and perhaps mostly importantly for this proposal, we lack comprehensive data specifically at the *protein* level (where function is really determined).

In this proposal we aim to close the gap, using both experimental and computational post-genome science, to study specific signalling pathways in a model organism (the fruit fly). Importantly, we already have the necessary methods in place to do this, bringing together UK experts in proteomics (both experimental and computational) with fly genomics and signalling experts to tackle this challenge. This includes state-of-the-art bioinformatics tools from groups who lead the way in the annotation of genome sequences and predicting protein function. Importantly, they are now able to consider the "unknowns" discussed above, such as different isoforms and their likely effects on interacting partner proteins.

We will characterise the developmental fly proteome, in terms of the levels, isoforms, interactions and locations of the important signalling proteins in order to generate a developmental spatio-temporal map. This will be a major advance in both developmental biology and genome science, which we hope will form an important resource for all biologists interested in gene function and development, as well as advancing and integrating the technologies needed to study it.

Technical Summary

We will perform a comprehensive temporal characterisation of the Drosophila embryonic isoformal proteome, followed by a focused quantitative analysis of the Wnt, RTK and Notch signalling pathway proteome. Using SILAC-IPAC and LOPIT technologies we will characterise the dynamic interactome and subcellular localisation for key signalling pathway components. Proteomics analysis will employ Data Dependent strategies via MuDPIT LC-MS/MS, Data Independent analysis via SWATH-MS and fully quantitative characterisation via QconCAT Selective Reaction Monitoring. Proteomics data will be collected across 12 embryonic timepoints, matching published high coverage RNA-seq data, to provide a community resource of great utility. Along with established in vivo tagged fly lines we will use recombineering technology to generate isoform-specific tagged forms of key signalling pathway components predicted to dynamically change interaction partners across embryonic development.

Tightly integrated with our experimental work, bioinformatics will generate an open datawarehouse, providing community access to our data. From this we will generate new network-based views of the Drosophila proteome, incorporating our isoformal proteome and public data, which will be exploited in state-of-the-art function prediction tools to predict novel signalling pathway protein interactions. We will use in vivo tagged lines to test interaction and subcellular localisation predictions by targeted SILAC-iPAC experiments. Using these data we will advance a new protein interaction prediction tool, accounting for isoforms.

To encourage uptake we will disseminate our data widely, making raw and processed views of the dynamic embryonic fly proteome available via web servers, web services and existing community databases. In consultation with the research community we will develop data visualisation tools to facilitate non-expert data access, and comprehensive training courses to disseminate expertise.

Planned Impact

We plan to deliver impact in 4 areas, with particular emphasis on advanced training to our staff and the wider research community, and dissemination of data, tools and technology. In addition we will engage in extensive public engagement activities and explore industrial take-up of technologies we develop, primarily through the Lilley lab and their good links with industry.

Our project seeks to address fundamental questions in the molecular biology of embryonic development using the latest cutting-edge post-genomic science. It is therefore highly multi-disciplinary and requires researchers with quite disparate backgrounds and skills, yet with open minds and collaborative mind-sets. We are confident, based on our track records, that this exists for the principal investigators but we aim to enthuse our research staff with this ethos and spread the word to others through workshops/training courses that we will run. We argue too that we will be pioneering in terms of the extra dimensions that our proteomics studies will generate, and this will necessitate innovations in the attendant bioinformatics - both to process and acquire the data, and to exploit it and learn from it to develop novel prediction tools. This should set new paradigms in the computational biology field and encourage other groups to consider similar approaches, and we hope our training and dissemination activities will achieve this aim.

In parallel, we plan to communicate the advances we make in both the developmental cell biology and post-genomic technologies to the wider public, via a variety of sources. This will include open days and talks, as well as more 21st century means (wikis, twitter and You Tube videos).

Finally, we will explore exploitation opportunities of our software and proteomic technologies where appropriate (though much of our informatics will be open source, and all of it free to academics). The Lilley lab will continue to present updates to industrial colleagues, as detailed in our Pathways to Impact statement.

Publications

10 25 50
 
Description bioinformatics analyses have led to convincing preliminary evidence for different isoforms of proteins expressed from fly genes during development of the embro. We have developed lots of bioinformatics tools associated with this, including tools to identify good candidates, linked them to different potential interactions and advanced associated tools that predict protein function.

We have also developed functional prediction tools that use various data types, some generated on this project or linked to it, which can help biologists predict the function of genes using just the DNA (or protein) sequence data, plus some of gene expression data.

We also now have good datasets for the temporal quantitative proteome which is being analysed now and will be published. Notably, we have excellent direct proteomic evidence, aquired for the first time, for stop-codon readthrough events where expression of the protein encode by a gene extended beyond the normal 'stop' point to a novel secodary one. Also, we have excellent evidence that particular isoforms of a gene are expressed differentially in developing organisms in different tissues, and have generated novel construct to allow us to see this in whole fly embryos.

Complete proteomics datasets are being uploaded to PRIDE whilst we are writing papers and we have elected not to place them under an embargo
Exploitation Route More experiments such as the ones we plan to do next -for example, targetted strategies to quantify some stop codon readthrough candidates directly via mass spectrometry.
The spatial proteomics methodologies continue to advance too.
Sectors Electronics,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Software for use in quantitative proteomics has been developed, notably by Gatto, Lilley and colleagues linked to the analysis of spatial proteomics data. Thsi has been made freely available through various routes, mostly through the statistical programming language R
First Year Of Impact 2016
Sector Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
 
Title LOPITDC 
Description A novel method to characterise the subcellular proteome using differential centrifugation 
Type Of Material Technology assay or reagent 
Year Produced 2019 
Provided To Others? Yes  
Impact Publication of manuscript in Nature Communications Adoption of method by industry 
 
Title TAPAS 
Description TAPAS is a novel method for predicting the functional neighborhood of protein isoforms. It combines RNAseq expression data with network and functional annotation data, to make inferences about potential isoform specific functional roles. It also makes use of novel distance measures to find the most interesting protein isoform switching events. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact Components of the TAPAS method, especially its novel distance metric, are being used as part of a suite of tools developed to help support target selection in the BBSRC funded DDIP project which is studying the developing fly interactome. Initial experimental validation in the DDIP consortium of the functional roles for these targets, looks promising. 
URL http://download.cathdb.info/gene3d/CURRENT_RELEASE/TAPAS/
 
Title Tapir 
Description TAPIR, takes a genome and ranks splicing events based on likelihood of interest for experimentalists. For example TAPIR allows splicing events to be ranked based on how likely they are to rewire protein interaction networks. TAPIR also predicts likelihood of involvement in developmental pathways of interest. TAPIR integrates further information to support the splicing event (RNAseq, proteomics and Evolutionary) . It provides sequence information mapped onto the gene structure, to enable insertion of a CRISPR tag at an appropriate position (one that is predicted to be unlikely to effect the structural integrity of the protein). 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact The tool has enabled experimental annotation of protein isoforms involved in developmental signalling 
URL http://www.ddip-tapir.uk/
 
Title FUNL 
Description A website for Pathway prediction in fly and human (predictions currently being used as part of the Tapir resource developed on the sLoLa i.e.for analhisng gene function in the context of RTK signalling etc). Fun-L (Functional Lists) is a tool for target prioritisation for experimentalists (and is similar to GeneMania in this respect). Fun-L carries out the following: given a set of query genes known to be involved in the pathway of interest the remainder of the genome is ranked by likelihood of shared pathway membership to this initial query. Testing the candidates near the top of the ranking improves the success rate of subsequent experiments 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact It was first augmented with the fly specific data for the sLoLa project in 2015, though the original version of this resource was available earlier in 2014 
URL http://funl.org/
 
Title Gene3D extensions 
Description Updated website providing additional splicing annotation data and structural models for Drosophila genes. Gene3D takes CATH domain families (from PDB structures) and assigns them to the millions protein sequences (using Hidden Markov models generated from HMMER) with no PDB structures, and we've incorporated the know Drosophila splicing information - and this is all publically available. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Additional data integration for use on sLoLa project (or for any other interested parties, since its all publically available) 
URL http://gene3d.biochem.ucl.ac.uk/
 
Title TAPIR 
Description A web tool, originally designed by UCL colleagues for internal use in the project, which supports the selection of appropriat transcripts that display isoform switching during a time course or range of transcriptomics experiments. It has been made available via a publicly accessible URL (see below) and published. 
Type Of Material Data analysis technique 
Year Produced 2015 
Provided To Others? Yes  
Impact It has become very useful for us to prioritise Fly genes for further proteomics study as we aim to characterise isoformal changes at the protein level during developement 
URL http://www.ddip-tapir.uk/
 
Description Molecular causes of convergent evolution 
Organisation University of Cambridge
Department Department of Zoology
Country United Kingdom 
Sector Academic/University 
PI Contribution Our expertise in CRISPR-Cas9 genome engineering, particularly tagging and generation of deletion mutants was a key part of a successful BBSRC grant award to examine evolutionary aspects of butterfly wing patterning. We are advising on the design and implementation of CRISPR experiments in the butterfly.
Collaborator Contribution Expertise in butterfly genetics, genomics and evolution.
Impact None yet
Start Year 2017
 
Description Presentation at conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation to 60 postgraduates at a conference, detailing research aspects of the DDIP project and research tools to help experimentalists
Year(s) Of Engagement Activity 2016
URL http://www.globaleventslist.elsevier.com/events/2016/07/drosophila-genetics-genomics/
 
Description School visit (stockport) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Talk presented to primary school children on why its good to study science, how to become a scientist, and a specific example of the impact of genomic science to everyday life - in this case, why we drink milk explained through a single SNP in the lactase gene.
Year(s) Of Engagement Activity 2015
 
Description Software sustainability, sharing and management workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The goal of the workshops was to flesh out the current problems in software management and sharing and try to identify possible solutions. The researcher-led nature of this event provided researchers, software engineers and support staff with a great opportunity to discuss the issues around creating and maintaining software collaboratively and to exchange good practice among peers. I was invited as a group moderator to promote and favour group discussions.
Year(s) Of Engagement Activity 2016
URL https://unlockingresearch.blog.lib.cam.ac.uk/?p=1286