14CONFAP From Comparative genomics to Phylogenomics: uncovering the genomic complexity and evolutionary adaptations of twenty species of protozoa

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

A fundamental schism exists between organisms with cells such as our own which contain nuclei (eukaryotes), and bacterial cells which do not. The vast majority of eukaryote species are single celled organisms or protozoa displaying enormous genetic diversity leading to huge variation in biology, myriad form and function. In just a few groups of protozoa the ability to parasitize animals such as ourselves has arisen. Kinetoplastids and Diplomonads are two such groups of protozoa which are believed to have diverged from the animal lineage not long after the last eukaryotic common ancestor (close to the root of the eukaryotic tree). Within both groups are important but neglected pathogens of humans - those which cause the deadly vector-borne trypanosomiases and leishmanias; and those that cause the waterborne diahorrea, giardiasis and a variety of other pathogenic species and free living species which are free living rather than parasitic. Comparison of the genomes from pathogenic and apathogenic members of these groups will highlight the evolution of groups of genes encoding proteins which act to circumvent the defences of animal hosts and upon which the pathogenicity and virulence of these organisms depends. Such genes are also key targets for vaccination, drug and monoclonal therapeutics.

The proposal brings together an expert team of parasitologists, protozoologists, evolutionary biologists, genome biologists and bioinformaticians from the FioCruz Institute in Brazil and from TGAC and the University of East Anglia in the United Kingdom. The project undertakes to deliver the first high quality genome sequences of twenty kinetoplastid and diplomonad genomes. The genomes selected will span the breadth of the genetic diversity in these groups and will be analysed in concert with existing genomic data from the key pathogenic species. The technology involved is state of the art and constantly upgraded and the expectation is that the genomes and transcriptomes produced will be of the highest possible quality. There will be a reciprocal exchange of expertise, with bidirectional knowledge transfer of technical and analytical techniques facilitated by exchange visits of key personnel and students.

Our basic strategy will be to culture the organisms, making use of the Wolfson laboratory for emerging pathogens at UEA for the culture of the pathogenic members of the group. We will harvest and purify the nucleic acid using a chaotropic buffer to disrupt the cells and silica affinity for nucleic acid purification. Will we use a mixture of methods and technologies to assemble high quality genomes including whole genome sequencing and optical mapping for the genomes and RNA-seq to delimit the transcriptome. Finally, we will combine the data sets for each lineage to infer details from the genomic complexity relating to evolutionary adaptations for parasitic lifestyle.

In so doing we will establish sustainable collaborations between UK and Brazilian researchers that will lead to publications, and substantial advances in the field upon which the new collaborations can build future projects.
Overall, the purpose of these analyses is to add insight into the functional biology of two groups of divergent flagellated protozoans which have independently evolved from free living organisms to major human and animal pathogens. Each group is biologically distinctive and famously defined by peculiarities in cell and molecular biology - elucidating how these peculiarities have evolved and continue to do and their contribution to the evolution of parasitism in these distinct lineages is fundamental biology and will be the primary objective of this work.

Technical Summary

Initial sequencing, assemblies and phylogenomics will be conducted at the Fiocruz Institute. Long reads and refinement of the assemblies and reannotation of genomes will be undertaken at TGAC. Overall we undertake to
(i) Sequence and annotate the genomes of 20 new culturable Kinetoplastid and Diplomonad species.
(ii) Identify the orthologs and paralogs from the genomes sequenced
(iii) Perform the functional categorization of shared genes between species
(v) Identify the paralogous genes for each of the studied species
(vi) Identify orphan genes (coding regions without similarity to other genes in the database) in the new genomes.

The UK work will focus on
1) High quality genome assemblies using PacBio long reads
High quality assemblies empower comparative genomic and population genetic analysis. UEA will choose and prepare high molecular weight DNA from a set of 20 important samples e.g. representing key nodes in the phylogeny of Kinetoplastid and Diplomonad species. TGAC will construct large insert gel size selected libraries from the supplied DNA and sequence this on a Pacific Biosciences sequencer which currently generates mean read lengths of 12kb (max 50+kb) but with accuracy of ~85% (errors are close to random). Using deep sequencing and the HGAP3 pipeline the longest reads will be corrected with the shorter reads, the resulting very long and accurate reads can be assembled into megabase sized contigs, before polishing any remaining errors with PacBio or Illumina sequence.
2) Annotation of mRNA genes and DNA modifications
PacBio single molecule reads will be re-analysed for polymerase kinetic changes caused by over 20 DNA modifications (including base-J), then associated motifs identified (Schadt et al. Gen. Res. 2012). UEA will supply RNA samples from the same 20 samples for deep RNA-seq on TGAC's Illumina HiSeq, this experimental data will highlight genes especially those too divergent to be easily computationally identified.

Planned Impact

N/A
 
Description We have sequenced and assembled ten genomes of several species of protozoa using PacBio long read technology proving the usefulness of this technology. We have had problems extracting sufficient amounts of high quality and high molecular weight DNA for some species, in these case we have switched to Illumina technology which is less demanding, though we have used the longest Illumina reads possible. We have also tested RNAseq for annotating novel genes.

The work we did on Euglena, a related free living relative to parasitic protozoa has contributed to Euglena International Network.
Overall our work also contributed towards the Protozoa work package (led by EI) in the Darwin Tree of Life (aiming to sequence all eukaryotes in the British and Irish isles), which itself is part of larger European (e.g. ERGA) and worldwide (EBP) efforts.
Exploitation Route The genomic sequences of these protozoa can be used as the base of comparative and functional studies to understand their biology, evolution and key points of difference between free living and parasitic forms - highlighting potential drug targets. We have published one paper from this study describing a single species.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description Non-pathogenic (free living) kinetoplastids 
Organisation University of Cambridge
Department Department of Public Health and Primary Care
Country United Kingdom 
Sector Academic/University 
PI Contribution We are providing sequencing data and genome assemblies of free living kinetoplastids. This is based on the latest long read (PacBio) and linked read (10x genomics libraries sequenced on Illumina).
Collaborator Contribution Our collaborators have existing Illumina sequence data assemblies for several species. In particular they are part of the International Euglena Sequencing Consortium - this assembly is still fragmneted, we are integrating 10x genomics linked read data to increase the sequence contiguity.
Impact Several sets of DNA sequence and genome assemblies whihc are under various stages of study.
Start Year 2016
 
Description Non-pathogenic (free living) kinetoplastids 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution We are providing sequencing data and genome assemblies of free living kinetoplastids. This is based on the latest long read (PacBio) and linked read (10x genomics libraries sequenced on Illumina).
Collaborator Contribution Our collaborators have existing Illumina sequence data assemblies for several species. In particular they are part of the International Euglena Sequencing Consortium - this assembly is still fragmneted, we are integrating 10x genomics linked read data to increase the sequence contiguity.
Impact Several sets of DNA sequence and genome assemblies whihc are under various stages of study.
Start Year 2016
 
Description Pathogenic kientiplastids 
Organisation London School of Hygiene and Tropical Medicine (LSHTM)
Country United Kingdom 
Sector Academic/University 
PI Contribution We are extracting DNA (where necessary), sequencing and assembling the genomes of 6 kinetoplastids
Collaborator Contribution Our collaborators at Uni Sao Paulo, South Bohemia and LSHTM have cultured very difficult to grow obligate pathogenic kinetoplastids for our comparative genomics study of kinetoplastids: based on life style (free living versus pathogens) as well as evolutionary distant species. Our collaborators at Fiocruz, are expert bioinformaticians on these species, key for comparative genomics.
Impact SO far 6 genome sequences. whihc are being studied at Fiocruz and in the UK. Our aim is to identify possible vaccine, or drug targets usign comparative genomics.
Start Year 2016
 
Description Pathogenic kientiplastids 
Organisation Oswaldo Cruz Foundation (Fiocruz)
Country Brazil 
Sector Public 
PI Contribution We are extracting DNA (where necessary), sequencing and assembling the genomes of 6 kinetoplastids
Collaborator Contribution Our collaborators at Uni Sao Paulo, South Bohemia and LSHTM have cultured very difficult to grow obligate pathogenic kinetoplastids for our comparative genomics study of kinetoplastids: based on life style (free living versus pathogens) as well as evolutionary distant species. Our collaborators at Fiocruz, are expert bioinformaticians on these species, key for comparative genomics.
Impact SO far 6 genome sequences. whihc are being studied at Fiocruz and in the UK. Our aim is to identify possible vaccine, or drug targets usign comparative genomics.
Start Year 2016
 
Description Pathogenic kientiplastids 
Organisation Universidade de São Paulo
Department Department of Parasitology
Country Brazil 
Sector Academic/University 
PI Contribution We are extracting DNA (where necessary), sequencing and assembling the genomes of 6 kinetoplastids
Collaborator Contribution Our collaborators at Uni Sao Paulo, South Bohemia and LSHTM have cultured very difficult to grow obligate pathogenic kinetoplastids for our comparative genomics study of kinetoplastids: based on life style (free living versus pathogens) as well as evolutionary distant species. Our collaborators at Fiocruz, are expert bioinformaticians on these species, key for comparative genomics.
Impact SO far 6 genome sequences. whihc are being studied at Fiocruz and in the UK. Our aim is to identify possible vaccine, or drug targets usign comparative genomics.
Start Year 2016
 
Description Pathogenic kientiplastids 
Organisation University of South Bohemia
Department Institute of Parasitology
Country Czech Republic 
Sector Academic/University 
PI Contribution We are extracting DNA (where necessary), sequencing and assembling the genomes of 6 kinetoplastids
Collaborator Contribution Our collaborators at Uni Sao Paulo, South Bohemia and LSHTM have cultured very difficult to grow obligate pathogenic kinetoplastids for our comparative genomics study of kinetoplastids: based on life style (free living versus pathogens) as well as evolutionary distant species. Our collaborators at Fiocruz, are expert bioinformaticians on these species, key for comparative genomics.
Impact SO far 6 genome sequences. whihc are being studied at Fiocruz and in the UK. Our aim is to identify possible vaccine, or drug targets usign comparative genomics.
Start Year 2016