Rooting the eukaryotic radiation with new models of gene and genome evolution

Lead Research Organisation: University of Bristol
Department Name: Biological Sciences

Abstract

The origin of eukaryotes from their prokaryotic progenitors was one of the most formative transitions in the history of life, catalysing the blossoming of eukaryotic biodiversity into the astonishing range of forms we see today, from the largest organisms on our planet - blue whales, giant sequoias, fungal networks extending for miles underground - to microscopic plankton that jostle with bacteria in the world's oceans. Explaining the leap in cellular complexity during the prokaryote-to-eukaryote transition is one of the outstanding challenges in 21st-century biology.

The common structure of all eukaryotic cells testifies to their shared ancestry, but our understanding of the kind of cell that ancestral eukaryote was - where it lived, what it ate, the kinds of biochemical reactions it could perform - is in disarray. Whole-genome data have enabled us to resolve the more recent divergences in eukaryotic evolution, but we still have a very poor understanding of the deeper relationships between the main groups at the base of the evolutionary tree. In particular, the root of the tree - the starting point of the eukaryotic radiation - remains mired in controversy and debate.

The problem is that traditional rooting methods rely on the use of an outgroup: to find the root of the tree of mammals, for example, we might include birds in the analysis, and then use our a priori knowledge to place the root on the branch between the two groups. This approach breaks down when applied to the eukaryotic radiation: including our closest prokaryotic relatives greatly reduces the proportion of the eukaryotic genome that can be analysed, and the enormous evolutionary distance to the prokaryotic outgroup obscures the relationships among the different eukaryotic lineages. As a result, recent analyses of the eukaryotic root disagree strongly on its position, despite using similar datasets and analytical approaches.

In this project, we will tackle these difficulties head-on to definitively resolve the root of the eukaryotic tree by applying new outgroup-free rooting approaches, including some pioneered by members of the project team, to the most up-to-date, representative sampling of eukaryotic genomic diversity yet assembled. We will use the resulting phylogenomic framework to map the points in evolutionary history at which the unique cellular and genomic traits of modern eukaryotes first evolved, establishing a timescale for the evolution of key eukaryotic innovations. By mapping these traits onto the tree, we will reconstruct a detailed cellular and genomic model of the ancestral eukaryote - an organism which may have lived up to two billion years ago - in order to establish its lifestyle, ecology, and metabolism, and to test hypotheses of how that founding lineage gave rise to the staggering diversity of eukaryotic life we see today.

The work we are proposing is fundamental discovery science: the ultimate goal is to understand our own origins, to bring clarity to a poorly-understood period in the history of life vitally important for making sense of the biodiversity we see around us today, and in doing so to establish a new state-of-the-art for phylogenetic rooting with broad applicability to other major evolutionary transitions across the tree of life. But there is also real potential for broader socio-economic impact. Some of the groups that branch near the base of eukaryotic tree are parasitic, and so establishing how these evolved from their free-living ancestors will provide new, much-needed insights into the adaptation of eukaryotic parasites such as Trypanosoma (sleeping sickness) and Giardia to their hosts. As part of the research programme, we will host summer internships for motivated students on biohacking (DIY computational biology), providing a taste of scientific discovery and teaching the crucial computational, statistical and scientific skills needed to identify and nurture the next generation of scientific leaders.

Planned Impact

The track record of the project team to date demonstrates our shared commitment to public outreach and testifies to our philosophy that academics have an important role to play in wider society. A recent highlight of PI Williams' public engagement programme was to co-organise a workshop on microbial evolution and antibiotic resistance at the 2013 British Science Festival in partnership with Corylus Learning, a leading educational company. We will maintain and develop our commitment to outreach and broad societal impact over the three year-period of this grant, and we envisage that the proposed research will directly benefit two main groups outside academia: members of the general public, and motivated secondary school and undergraduate students.

(i) Members of the general public: PI Williams' research to date on the evolution of eukaryotes and viruses has received broad interest from the general public, and has been covered by a number of popular science outlets including National Geographic and Discover magazine. He already has extensive experience communicating the excitement of this work to non-specialists and the general public, from a workshop at the 2013 British Science Festival to participation in Google's Science Foo Camp 2014, which involved discussing and disseminating work on early evolution to journalists, artists and entrepreneurs. We will build on this experience with a series of public lectures on eukaryotic origins delivered as part of Bristol University's "Twilight Talks" programme, the PDRA will deliver a Nature Live lecture on microbial eukaryotic diversity at the Natural History Museum, and we will also coordinate with Bristol's Centre for Public Engagement to participate in major science festivals such as the Bristol Festival of Nature and the Cheltenham Science Festival, in order to communicate the joy of discovery science, and the particular appeal of origins research, most effectively to the broadest possible audience.

(ii) Motivated secondary school and undergraduate students: Biohacking, or do-it-yourself bioinformatics, is a hugely exciting but under-explored way to engage young people in science, teach essential computational, statistical and biological skills, and help identify and train the next generation of research leaders. I will host summer internships for four motivated students (two per year over the second and third years of the grant) to explore the potential of biohacking for elucidating the origin of key eukaryotic genes during the earliest period of their evolutionary history. Lab internships and work experience were an important part of my scientific development, and I will be delighted to provide the same opportunities to the next generation of young research leaders. These internships will be mutually beneficial, enabling the most talented students to develop their scientific skills and ideas while also providing them with a genuine opportunity to contribute to progress in this fundamental research area. The PDRA will directly supervise one of these students each year, which will also provide them with valuable supervisory experience as part of a broader training and career development programme.

We will maximise the reach and effectiveness of these impacts by engaging fully with NERC's new public engagement strategy - taking part in upcoming training opportunities when announced - and we will monitor the ongoing success of our impact programme throughout the funding period through benchmarking against a series of impact milestones, detailed in Pathways to Impact.
 
Title "Notes from the subsurface": collaboration with video artist Charlie Tweed 
Description I collaborated with (providing scientific advice, guidance, and script editing) the video artist Charlie Tweed (http://www.charlietweed.com/category/news/) to create a short film exploring microbial life underground, and how adaptations and life strategies of microbial eukaryotes and prokaryotes might inspire humanity in the future. The film was premiered at an "EarthArt" event in the Earth Gallery at the University of Bristol. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact The film has now been entered in competition at several small film festivals, and will soon be available to stream online. 
URL http://www.charlietweed.com/2020/01/notes-from-the-subsurface-solo-show/
 
Description The main scientific aims of the project were largely achieved in the past year, and we are currently writing up the work for publication in scientific journals. The two most significant outputs, and associated findings, are:

1. The root of the eukaryotic tree: Analyses of gene family evolution, gene content, and traditional phylogenomics support a root between metamonads (one group of single-celled eukaryotes) and all other eukaryotic groups. This result resolves a debate in the field about where the root lies, and also has major implications for how we understand the process of evolution through time: it suggests that the very first eukaryotic cells might have been somewhat simpler, in terms of their gene repertoires and cell structures, than was thought previously. We are currently completing these analyses and will write both technical and more accessible articles about the work later in the year.

2. New genomes from previously unexplored eukaryote groups: We have sequenced, assembled, and are currently analyzing genome sequences from two novel single-celled eukaryotes: (1) Diplonema papillatum, a free-living phagotrophic predator, and (2) Pseudotrichomonas keilini, a free-living anaerobic eukaryote (that is, a eukaryotic cell that lives without oxygen). These genomes represent the first high-quality genome sequences for their groups, and the Pseudotrichomonas genome is the first free-living metamonad genome that has ever been sequenced. The Pseudotrichomonas genome is also interesting because it is closely related to Trichomonas vaginalis, a serious human parasite. Comparing the genomes of the two organisms will help to distinguish the novel genes in Trichomonas that are directly involved in parasitism from those which are also found in its non-parasitic relative, and therefore are better explained simply as unique features of the group as a whole.

3. Use of the new tree (1) to trace the history of secondarily photosynthetic eukaryotes. We have used the new methods and data generated during the project to investigate when in evolutionary history secondary endosymbioses occurred during eukaryotic evolution. The findings, which were published recently, indicate that large time "lags" separated the origin of each photosynthetic eukaryotic group and their subsequent ecological impact on the planet's functioning, suggesting that later ecological factors --- and not the event(s) of symbiosis themselves --- were causative on major lineage turnover, such as the rise to ecological prominence of diatoms and other abundant secondarily photosynthetic eukaryotes.
Exploitation Route (1) Methodology: We have developed and road-tested new approaches to rooting phylogenetic trees during this project that will be relevant broadly in evolutionary biology and phylogenetics. Most evolutionary problems require the direction of evolution to be known, and we have shown how this can be done using models of gene content evolution with greater power and accuracy than has been possible previously. These methods will be applicable very generally.
(2) The rooted phylogeny of eukaryotes we have inferred will provide the basis for palaeontologists to investigate the timescale of eukaryotic evolution using molecular clocks. (I have also submitted a research grant on this subject as follow-up work).
(3) Insights into novel lineages of eukaryotes: We have described the first genomes for two new groups of eukaryotes (diplonemids and free-living parabasalids). These genomes raise new questions about eukaryotic biology, including about the evolution of parasitism (for example, the free-living Pseudotrichomonas appears to be secondarily free-living, having evolved from a parasitic or at least endobiotic ancestor). Understanding how that process occurs, and the environmental factors that permit the transition between free-living and parasitic lifestyles in microbial eukaryotes, represent exciting new research directions that are directly enabled by the new data we provide.
Sectors Digital/Communication/Information Technologies (including Software),Education,Environment,Healthcare,Culture, Heritage, Museums and Collections

 
Title Phylogenetic relevance vector machine for comparative biology 
Description As part of analyses aimed at inferring ancestral states for eukaryotic cells, we have developed a new Bayesian method for phylogenetic regressions that is inspired by some techniques in machine learning. The tool (implemented as an R package, and for which a publication is in preparation) uses automatic relevance determination (from sparse Bayesian learning) to weight the importance of a potentially large number of observational independent variables in order to model a dependent variable. We are using it for prediction of continuous features of early eukaryotes (such as optimal growth temperature, pH preferences), but the package could be applied much more broadly in comparative biology as an alternative (or complement) to traditional approaches such as phylogenetic least squares. It may help to solve the problem of determining which variables, among a large set of possible candidates, are most important for predicting a particular output. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact So far, we have used it to address research questions such as inference of the optimal growth temperature of the last universal common ancestor, and we are testing it by attempting to predict the (known) temperature and pH preferences of some modern taxa. 
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Extended Data for A rooted phylogeny resolves early bacterial evolution 
Description Extended Data for A rooted phylogeny resolves early bacterial evolution 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Extended_Data_for_A_rooted_phylogeny_resolves_early_bacterial_...
 
Title Treelists, constraints and logfiles for: Relative time constraints improve molecular dating 
Description Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. This is the case for instance for horizontal gene transfer events and ancient coevolutionary relationships such as (endo)symbioses, which can imply temporal relationships between two nodes in a phylogeny (Davin et al. 2018). This can be particularly helpful when the geological record is sparse, e.g. for microorganisms, which represent the vast majority of extant and extinct biodiversity. Here, we demonstrate that relative age constraints, when combined with fossil calibrations, can significantly improve both the accuracy and resolution of molecular clock estimates. We provide an implementation of relative age constraints in RevBayes (Hoehna et al. 2016) that can be combined in a modular manner with the wide range of molecular dating methods available in the software. To validate our method in a realistic data setting we apply it to two data sets of 40 Cyanobacteria and 62 Archaea respectively, and provide cross-validations of fossil calibrations and relative age constraints. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL http://datadryad.org/stash/dataset/doi:10.5061/dryad.s4mw6m958
 
Description Bristol Dinosaur Project & Bristol Museum & Art Gallery - Workshop "Introduction to Palaeontology" to Primary School children. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact "The Bristol Dinosaur Project is an outreach wing of the Earth Sciences Department of the University of Bristol, using palaeontology as an ambassador to communicate science to all ages. Taking students and staff from the university, active in their field of study, we communicate the latest developments and core science concepts through fun and engaging workshops and talks. We are working mostly with schools either through visits or in collaboration with the Bristol Museum & Art Gallery during discovery workshops."

I took part on 2 workshops with the BDP and the Bristol Museum in November 2018. The workshop is designed for Primary School children, where they are discovering palaeontology through 4 different activities, led by BDP volunteers in the Natural History part of the Bristol Museum. The children have the opportunity to visit a highly cultural place they won't necessarily would have the possibility to visit otherwise, engage with Scientists and museum staff, and of course, learn about Biology and Palaeontology. They are encouraged to ask question and interact with the volunteers as much as they want. Most of them are very keen to participate, and the feedback from the teachers have been excellent, with demands for further booking next year and promise to advertise the workshop to their colleagues. The BDP is opening them the doors of a national museum in a very privileged way, shows them that science and Natural History is fascinating, and give to some of them a personal opportunity to interact with professional scientists, including a large number of women. The social and ethnic background of the children involved being very diverse is also a great opportunity for us to share our passion of science. I believe that the impact on children, if not immediate, will be massive in the long term, discovering that STEM are interesting, museums fun and scientist open-minded will stick with them.
Year(s) Of Engagement Activity 2018
URL http://www.thebristoldinosaurproject.org.uk/
 
Description DigiLocal Coding Club - Introduction to coding & computational thinking in community center (Bedminster Library, Bristol, UK) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact "DigiLocal® is supporting communities in running tech clubs for their young people.
These provide regular, positive activities with tech. It encourages problem solving and what's sometimes called 'computational thinking' in schools.
We work closely with community centres and schools for activities on a weekly basis. We mostly use Scratch and Python on Linux environments as these are easiest to pick up quickly, but both provide high levels of sophistication once you master the basics."
I took part of a DigiLocal coding club, every Wednesday night between November 2018 and February 2019, working with the community centre of Bedminster, Bristol, in the Bedminster Library. The activity consisted to guide a group of about 10 to 15 children between 8 and 13 years old. through the basis (and less basic!) of coding, first through the software Scratch and then in Python. This activity is an amazing opportunity for children to discover Linux environment and coding in a fun way. If the main goal is for them to first fiddle with "computational thinking", it is a great time to engage with them on the importance of coding in my own professional practice and talk to them how STEM can be fun and useful. As the activity is a recurrent one, I think that the willingness of the children to come back week after weeks, and for some of them, so continue to work on their projects at home, is the sign of success.
On another note, this experience was an opportunity for me to engage with children from different origins and social background, in a part of the city the university is not reaching easily. The group of children I was working with, even if male dominated, was including several girls, a population hard to keep involved in STEM activities. I believe that my presence as a female volunteer had a positive impact for all children, showing that coding was for everyone.
Year(s) Of Engagement Activity 2018
URL https://hbb.org.uk/digilocal/