📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Architects of genomic change: the evolutionary dynamics of transposable elements

Lead Research Organisation: UNIVERSITY OF EXETER
Department Name: Biosciences

Abstract

Recently there have been great breakthroughs in computing and molecular biology. In combination, these have led to a vastly improved ability to generate and analyse large volumes of genetic data. Consequently, near-complete genome sequences are now available for a large variety of organisms. This genomic revolution has revealed many fascinating insights, but one of the most unexpected relates to the abundance of transposable elements (TEs) discovered within the genome.

TEs are short DNA sequences with the ability to move around in the genome via a process called transposition. Because of this property, TEs are sometimes referred to as jumping genes. Other names applied to TEs are selfish DNA, parasitic DNA, or even junk DNA, reflecting their perceived lack of contribution to host fitness. To become fixed in the hosts evolutionary lineage, TEs must invade the host germline (i.e. reproductive cells). This has been occurring over great evolutionary periods, leading to the abundance of TE sequences observable in sequenced genomes, the majority of which exist as genomic fossils that have become inactivated due to an accumulation of mutations.

Recently, it has emerged that TE sequences have been repeatedly utilised by host genomes for their own purposes during evolution. Indeed, it appears that TEs have played a significant role in the evolution of host genomic complexity via various mechanisms, including direct acquisition of coding sequence, genomic rearrangement, and gene regulatory modification.

Despite the widespread abundance of TEs and their important evolutionary contributions across the diversity of life, many questions concerning TE biology remain unanswered. However, the wealth in recently sequenced genomes now provides an exciting opportunity to perform novel large-scale systematic analyses of TE evolution to elucidate on poorly understood aspects of TE biology. In this proposal I will undertake such an analysis to examine the following four important aims:

1 Evolution of the LTR retrotransposons. A particularly diverse and abundant group of TEs with significant impacts on the genomes of a great diversity of organisms are the Long Terminal Repeat (LTR) retrotransposons. Until recently, it was very difficult to estimate evolutionary relationships in this group for methodological reasons, constraining advances. However, I have developed a new method to overcome this problem, offering the possibility to estimate evolutionary history and address questions of key significance in the group, which also includes highly important vertebrate viruses such as HIV.

2 Persistence of TEs in the genome. A major question is how active selfish elements persist in host genomes, while having no direct selective benefit to the host. I will quantify patterns in the proliferation of TEs and their spread across host diversity to elucidate on this long-standing problem.

3 Transposable elements and the evolution of host genomic complexity. I will explore the features that predispose TEs to being harnessed for host purposes, and examine how TEs interact to contribute to host genomic complexity.

4 Role of transposable elements in speciation. Hosts can evolve resistance mechanisms against TEs, but recently invading TEs are typically able to replicate more freely. Consequently, poor-repression of TEs is predicted to result in hybrids between two diverging lineages suffering negative fitness consequences due to increased TE activity, which consequently reinforces reduced gene flow. I will test these ideas to explore the role of TEs as promoters of speciation.

Study of the LTR retrotranspsons offers an opportunity to provide insights of relevance to combat disease, since the group contains infectious viruses such as HIV. Meanwhile, given the widespread utilisation of TE sequences for diverse host purposes during evolution, greater knowledge of TE biology will provide insights of potential applied and medical benefit more widely.

Technical Summary

Genomic data have revealed the great extent to which transposable elements (TEs) have infiltrated eukaryotic genomes. Concurrently, a major shift in perspective has occurred, from a view of TEs as mere junk or parasitic DNA to recognition of the considerable roles they have played in the evolution of host genomic complexity. However, understanding of many fundamental aspects of TE biology remains relatively poor. In particular, systematic analyses based within a robust evolutionary framework are required to elucidate on broad-scale TE evolutionary dynamics. I will capitalise on the recent accumulation of eukaryotic genomes, in combination with a novel phylogenetic approach I have developed, to examine the following four important areas of TE biology:

1 Phylogeny and evolution of LTR retrotransposons: I will address outstanding questions of key significance concerning evolution of long terminal repeat TEs that originate from the family Retroviridae, which includes highly important vertebrate viruses such as HIV, and the closely related family Metaviridae.

2 Dynamics of transposable element persistence: A major question in TE biology is how active selfish elements persist in host genomes, while having no direct selective benefit to the host. I will quantify patterns in TE proliferation and host usage to elucidate on this long-standing problem.

3 Transposable elements and the evolution of host genomic complexity: I will explore the features that predispose TEs to being harnessed for host purposes, and examine how TEs interact to contribute to host genomic complexity.

4 Role of transposable elements in speciation: The role of TEs in speciation remains relatively untested. Current developments in genomics offer an opportunity to test the broad hypothesis that gene-flow between host lineages is associated with a reduced capacity for TE repression, reinforcing host reproductive isolation and promoting speciation.

Planned Impact

The proposed project will increase understanding of the biology of transposable elements (TEs). This is a topic of considerable interest to scientists and the general public, and contains substantial promise for a wide range of applications. As a result, findings from the project will contribute towards the knowledge and understanding required to move towards a bio-based economy.

TEs make up a large proportion of the human genome, and are implicated in a wide range of diseases as well as in normal functioning. Consequently, the findings of this project hold great potential health relevance. In addition, further significance comes from the importance of retroviral disease, and the current vast global HIV epidemic. Research questions examining retroviral evolution, host usage, and the retroviral envelope gene may lead to new insights into retrovirus biology that could be used in new forms of treatment and drug development. Thus, research findings from this project have the capacity to enhance both health and quality of life

Results from this project also have the potential to foster economic performance and contribute to the economic competitiveness of the UK. These prospective contributions come predominantly from two sectors: agriculture and biotechnology. TEs are involved with widespread phenotypic traits, a considerable number of which probably influence production traits in domesticated animals. Identifying such inserts offers great scope for improving yields and adding to competitiveness in the farming sector. Meanwhile, a range of retroviruses exert harmful effects on agricultural livestock including poultry, cattle, and sheep. Conducting research that may lead to new methods to control these diseases carries benefits for animal welfare, improving yield and economic margins, and bolstering the resilience of the farming industry. Additionally, novel research on TEs may directly contribute to the development of new tools and methods in the industrial biosciences, which are currently a global growth area.

Furthermore, as set out in the pathways to impact, considerable efforts will be made to disseminate research findings among the public, thus fostering enthusiasm and understanding, and a general fluency in science and technology.

Publications

10 25 50
 
Title Additional file 1: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Phylogenetic tree of the amino-acid DDE transposase domain of 1631 autonomous Mutator-like elements. The tree results from a phylogenetic analysis using maximum likelihood inference, with 1000 bootstrap repetitions. Clade support values above 0.75 are indicated adjacent to each clade. Clades are divided into groups, as indicated by alternate shading, with a corresponding clade name and number to the right. For each clade (except eight groups for which we only recovered the amino acid transposase domain), a schematic summarising the structure of the TEs contained within each group is illustrated, with structural features represented by different coloured rectangles (please see the accompanying key). Elements are named according to their Repbase or Genbank ID, or according to the name provided in the article describing them. The host genome for each element is indicated to the right hand side of its ID, and labels are coloured broadly according to the taxonomic kingdom that the species belongs to: blue for Metazoa, purple for Excavates (Parabasalids), taupe for Oomycetes and Diatoms (Stramenopiles), black for Amoebozoa, orange/yellow for Fungi, and green for Plantae. Family-level groupings for MULE clades consisting of â L2 elements are indicated with right curly braces. When more than one schematics represent the structure of the elements within a clade, dotted lines indicate which elements the schematic depicts. The alignment used for this phylogenetic analysis is provided in Additional file 7. (SVG 6820 kb) 
Type Of Art Film/Video/Animation 
Year Produced 2019 
URL https://springernature.figshare.com/articles/Additional_file_1_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 1: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Phylogenetic tree of the amino-acid DDE transposase domain of 1631 autonomous Mutator-like elements. The tree results from a phylogenetic analysis using maximum likelihood inference, with 1000 bootstrap repetitions. Clade support values above 0.75 are indicated adjacent to each clade. Clades are divided into groups, as indicated by alternate shading, with a corresponding clade name and number to the right. For each clade (except eight groups for which we only recovered the amino acid transposase domain), a schematic summarising the structure of the TEs contained within each group is illustrated, with structural features represented by different coloured rectangles (please see the accompanying key). Elements are named according to their Repbase or Genbank ID, or according to the name provided in the article describing them. The host genome for each element is indicated to the right hand side of its ID, and labels are coloured broadly according to the taxonomic kingdom that the species belongs to: blue for Metazoa, purple for Excavates (Parabasalids), taupe for Oomycetes and Diatoms (Stramenopiles), black for Amoebozoa, orange/yellow for Fungi, and green for Plantae. Family-level groupings for MULE clades consisting of â L2 elements are indicated with right curly braces. When more than one schematics represent the structure of the elements within a clade, dotted lines indicate which elements the schematic depicts. The alignment used for this phylogenetic analysis is provided in Additional file 7. (SVG 6820 kb) 
Type Of Art Film/Video/Animation 
Year Produced 2019 
URL https://springernature.figshare.com/articles/Additional_file_1_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 5 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 5. Homeobox gene trees constructed with Maximum-likelihood method (LG + G) based on the homeodomain sequences (1000 bootstraps). 
Type Of Art Film/Video/Animation 
Year Produced 2020 
URL https://springernature.figshare.com/articles/presentation/Additional_file_5_of_Reconstruction_of_anc...
 
Title Additional file 5 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 5. Homeobox gene trees constructed with Maximum-likelihood method (LG + G) based on the homeodomain sequences (1000 bootstraps). 
Type Of Art Film/Video/Animation 
Year Produced 2020 
URL https://springernature.figshare.com/articles/presentation/Additional_file_5_of_Reconstruction_of_anc...
 
Title Additional file 5: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description A graph of the number of assigned clusters according to the thresholds applied in each ClusterPicker analysis. (JPG 122 kb) 
Type Of Art Film/Video/Animation 
Year Produced 2019 
URL https://springernature.figshare.com/articles/Additional_file_5_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 5: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description A graph of the number of assigned clusters according to the thresholds applied in each ClusterPicker analysis. (JPG 122 kb) 
Type Of Art Film/Video/Animation 
Year Produced 2019 
URL https://springernature.figshare.com/articles/Additional_file_5_of_Evolution_of_Mutator_transposable_...
 
Description Work was performed on four key aspects of transposable element research during my fellowship. The first aspect generated new knowledge on the evolution of DNA transposable elements of the terminal inverted repeat group, which includes many important elements used in research and genetic technology. Discoveries relate to the diversity, evolution, host usage, and taxonomy of several major groups of DNA transposable elements, including Mutator, Tc1/Mariner, and Pogo elements. More generally, we performed the first large-scale analysis of their diversity, evolution, and host range, demonstrating a great diversity of these elements, and that animals are a focal host group, with relatively few elements known from several major eukaryote lineages, and highlighting apparent barriers in host usage. The second aspect generated new knowledge on the ways that transposable elements have been used during host evolution. The specific aspects of insecticide resistance, animal colouration, and animal domestication were investigated, and a number of new cases were demonstrated, via both primary research, and in systematic reviews. The third aspect focussed on improving analysis methods for transposable elements, and resulted in two new pipelines, earlGrey, which improves the quality of repeat anntoations performed using repeatmodeler, and TE-strainer, which improves repeat library quality in particular by helping to identify if multicopy host genes have been erroneously labelled as repeats. Finally, know knowledge and genomes were generated for butterflies, and the interactions between transposable elements and butterfly genomes, elucidating evolutionary dynamics and the role of host ecological traits in shaping genomic transposable element content.
Exploitation Route The outcomes of this research are useful to researchers interested in the evolution and diversity of transposable elements in particular. The methods developed, particularly the earlGrey transposable element annotation pipeline, are of relevance to a wide range of users who annotate and study genomes, as repeat annotation is a key aspect of genomics, and transposable elements are important contributors to host evolutionary complexity and adaptation. The results are also of potential relevance to a large cross-section of researchers, particularly those in healthcare and agriculture, who analyse genomic data, and wish to identify or understand more about the biology of certain focal elements.
Sectors Agriculture

Food and Drink

Digital/Communication/Information Technologies (including Software)

Pharmaceuticals and Medical Biotechnology

URL https://github.com/TobyBaril/EarlGrey
 
Description Academic outputs have contributed to an increased awareness of the significant and general influences of transposable elements on host genome function and evolution, as well as the underlying mechanisms through which transposable elements exert their effects, and elucidation of their own evolution. These outputs are directly relevant to many applied fields, including agricultural production (both from the perspective of plant and animal breeding, and the control of damaging pest species) and the medical sciences. Research has also paved the way towards an increased understanding of transposable elements as fundamental general contributors to evolutionary complexity, and their still under-appreciated role in 'the rules of life'. A new transposable element annotation pipeline, Earl Grey, was also developed, which is publicly available, and which provides considerable improvements in automated transposable element annotation, as well as aiming to make the process more user-friendly and accessible to a which wider audience of interested scientists, including community-led opportunities for future development. This will hopefully contribute to a greater uptake of transposable element analyses on genomic data across the biological and biomedical sciences. Additionally, work on butterfly genomics contributed towards the roll-out of the Wellcome Sanger Darwin Tree of Life Project, with expertise and specimens contributed to the pilot taxon, Lepidoptera, and also leading more specifically to a large number of genome notes for different butterfly species.
First Year Of Impact 2017
Sector Agriculture, Food and Drink,Environment,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

Economic

 
Description A pangenomic approach to understanding the evolution of insecticide resistance
Amount £677,578 (GBP)
Funding ID BB/X006395/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 06/2023 
End 03/2027
 
Description Evaluating the contribution of transposons to agricultural domestication
Amount £0 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2020 
End 09/2024
 
Description Investigating the Role of Transposable Elements in the Evolution of Host Genomic Complexity
Amount £0 (GBP)
Funding ID 2072124 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2018 
End 09/2022
 
Title The Earl Grey Transposable Element Annotation Pipeline 
Description A transposable element annotation pipeline for annotating repeats in genome data, designed to improve on current gold standard approaches, and be easily usable for non-specialists. 
Type Of Material Technology assay or reagent 
Year Produced 2022 
Provided To Others? Yes  
Impact My PhD student Toby Baril handles the GitHub page, and has told me the program has been downloaded thousands of times, and that he regularly receives emails from users. We have not received many citations yet, but we are planning to submit a manuscript on the method to a major journal in the next couple of months, and hopefully users will cite this! The method outperforms all other tools in comparative tests (even commercial tools). We are also planning to release the 'TE-strainer' tool shortly, which identifies repetitive host genes annotated as TEs in genome annotations, which fills a key methodological gap currently, which is leading to problems (host genes labelled as TEs) in user uploaded data in online reference databases such as DFAM. 
URL https://github.com/TobyBaril/EarlGrey
 
Title Additional file 1 of Genome of the four-finger threadfin Eleutheronema tetradactylum (Perciforms: Polynemidae) 
Description Additional file 1. (XLSX 49 kb) 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Genome_of_the_four-finger_...
 
Title Additional file 1 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 1. Supplementary Tables, including contents page with table legends. Tables S1-S10. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Migrators_within_migrators...
 
Title Additional file 1 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 1. Supplementary Tables, including contents page with table legends. Tables S1-S10. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Migrators_within_migrators...
 
Title Additional file 1 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 1. Sequencing data of oyster M. hongkongensis generated in this study. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Reconstruction_of_ancient_...
 
Title Additional file 1 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 1. Sequencing data of oyster M. hongkongensis generated in this study. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Reconstruction_of_ancient_...
 
Title Additional file 10 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 10: Dataset S1. Coordinates and classification of all transposable element sequences identified in the monarch genome using the de novo library in conjunction with RepBase Arthropoda library and Dfam, in bed format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_10_of_Migrators_within_migrator...
 
Title Additional file 10 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 10: Dataset S1. Coordinates and classification of all transposable element sequences identified in the monarch genome using the de novo library in conjunction with RepBase Arthropoda library and Dfam, in bed format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_10_of_Migrators_within_migrator...
 
Title Additional file 10 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 10. Sequencing data. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_10_of_The_genome_and_sex-depend...
 
Title Additional file 11 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 11: Dataset S2. Coordinates and classification of all transposable element sequences identified in the monarch genome using the de novo library in conjunction with RepBase Arthropoda library and Dfam, in GFF format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_11_of_Migrators_within_migrator...
 
Title Additional file 11 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 11: Dataset S2. Coordinates and classification of all transposable element sequences identified in the monarch genome using the de novo library in conjunction with RepBase Arthropoda library and Dfam, in GFF format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_11_of_Migrators_within_migrator...
 
Title Additional file 12 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 12: Dataset S3. De novo transposable element consensus sequences identified in the monarch in FASTA format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_12_of_Migrators_within_migrator...
 
Title Additional file 12 of Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus 
Description Additional file 12: Dataset S3. De novo transposable element consensus sequences identified in the monarch in FASTA format. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_12_of_Migrators_within_migrator...
 
Title Additional file 2 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 2. Estimated repeat content present in the genome for the Pacific oyster, Sydney rock oyster, and Hong Kong oyster for both the assembly presented here, and the assembly of Peng et al. [46]. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Reconstruction_of_ancient_...
 
Title Additional file 2 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 2. Estimated repeat content present in the genome for the Pacific oyster, Sydney rock oyster, and Hong Kong oyster for both the assembly presented here, and the assembly of Peng et al. [46]. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Reconstruction_of_ancient_...
 
Title Additional file 2 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 2. Transposable elements information. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_2_of_The_genome_and_sex-depende...
 
Title Additional file 2: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Table S2. Genomic features and characteristics of each Ghost, Spectre and new aphid Phantom elements identified from the aphid genomes analysed. (XLSX 13 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_2_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 2: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Table S2. Genomic features and characteristics of each Ghost, Spectre and new aphid Phantom elements identified from the aphid genomes analysed. (XLSX 13 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_2_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 3 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 3. Alignment text files showing the DDD/E structure of each clade showing a conserved amino acid residues number between the second D and the third D or the E of the transposase domain. A) DDD/E alignment caption for HvSm and PlantMar. B) DDD alignment caption for TIGD1-4. C) DDD alignment caption for TIGD5-7. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_3_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 3 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 3. Alignment text files showing the DDD/E structure of each clade showing a conserved amino acid residues number between the second D and the third D or the E of the transposase domain. A) DDD/E alignment caption for HvSm and PlantMar. B) DDD alignment caption for TIGD1-4. C) DDD alignment caption for TIGD5-7. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_3_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 3 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 3. Repeat landscape plots. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Reconstruction_of_ancient_...
 
Title Additional file 3 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 3. Repeat landscape plots. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Reconstruction_of_ancient_...
 
Title Additional file 3 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 3. Summary of differentially expressed protein coding genes and microRNAs. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_The_genome_and_sex-depende...
 
Title Additional file 3: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description An amino acid alignment of elements that were removed from the phylogenetic analysis as a consequence of their lack of alignment with other MULE DDE domains. (FASTA 94 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_3_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 3: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description An amino acid alignment of elements that were removed from the phylogenetic analysis as a consequence of their lack of alignment with other MULE DDE domains. (FASTA 94 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_3_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 4 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 4. The amino acid alignment used to perform our phylogenetic analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_4_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 4 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 4. The amino acid alignment used to perform our phylogenetic analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_4_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 4 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 4. Homeobox gene sequences and genomic locations in mollusc genomes. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Reconstruction_of_ancient_...
 
Title Additional file 4 of Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Additional file 4. Homeobox gene sequences and genomic locations in mollusc genomes. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Reconstruction_of_ancient_...
 
Title Additional file 4 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 4. Expression data of mRNA. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_The_genome_and_sex-depende...
 
Title Additional file 4: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Table S1. Information on the different clusters indicated in Additional file 1. (XLSX 41 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_4_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 4: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Table S1. Information on the different clusters indicated in Additional file 1. (XLSX 41 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_4_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 5 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 5. Fasta sequences of the 75 full-length Tc1/mariner elements used in this study. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_5_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 5 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional file 5. Fasta sequences of the 75 full-length Tc1/mariner elements used in this study. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_5_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 5 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 5. Gene ontology and pathway enrichment analyses data. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_5_of_The_genome_and_sex-depende...
 
Title Additional file 6 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional files 6 to 12. Conserved locations of TIGD1 to TIGD7 in host vertebrate species and information about the upstream and downstream genes flanking them, retrieved from Ensembl [54]. Negative numbers indicate that the considered gene is upstream of the TIGD element. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_6_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 6 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional files 6 to 12. Conserved locations of TIGD1 to TIGD7 in host vertebrate species and information about the upstream and downstream genes flanking them, retrieved from Ensembl [54]. Negative numbers indicate that the considered gene is upstream of the TIGD element. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_6_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 6 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 6. Expression data of sesquiterpenoid pathway genes. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_6_of_The_genome_and_sex-depende...
 
Title Additional file 6: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Output trees from ClusterPicker with each inferred cluster coloured alternately, at the genetic distance thresholds of: 2.6, 3.6, and 4.2%. (ZIP 170 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_6_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 6: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description Output trees from ClusterPicker with each inferred cluster coloured alternately, at the genetic distance thresholds of: 2.6, 3.6, and 4.2%. (ZIP 170 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_6_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 7 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional File 13. The amino acid alignment used to perform the TIGD1-TIGD7 phylogenetic analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_7_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 7 of Phylogenetic analysis of the Tc1/mariner superfamily reveals the unexplored diversity of pogo-like elements 
Description Additional File 13. The amino acid alignment used to perform the TIGD1-TIGD7 phylogenetic analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_7_of_Phylogenetic_analysis_of_the_Tc1_m...
 
Title Additional file 7 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 7. Expression data of microRNAs. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_7_of_The_genome_and_sex-depende...
 
Title Additional file 7: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description The amino acid alignment used to perform our phylogenetic analysis. (FASTA 6699 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_7_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 7: of Evolution of Mutator transposable elements across eukaryotic diversity 
Description The amino acid alignment used to perform our phylogenetic analysis. (FASTA 6699 kb) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/Additional_file_7_of_Evolution_of_Mutator_transposable_...
 
Title Additional file 8 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 8. Expression data of neuropeptides 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_8_of_The_genome_and_sex-depende...
 
Title Additional file 9 of The genome and sex-dependent responses to temperature in the common yellow butterfly, Eurema hecabe 
Description Additional file 9. Comparative sex-biased analysis. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_9_of_The_genome_and_sex-depende...
 
Title Dataset for ELE_EV_ELE13757 
Description Methods: A literature search was performed using Google Scholar on 19th March 2019, which identified 368 citations of the original paper for TreeMap (Page 1994), and 332 citations of the original paper for Parafit (Legendre et al. 2002), resulting in a total of 700 articles that were screened to extract metrics for inclusion in our meta-analysis. Articles that did not contain cophylogenetic analyses were immediately excluded. Studies focussing at the population level were also excluded, as these do not represent true cophylogenetic analyses at the macroevolutionary level. Additionally, studies that included less than four taxa were excluded from consideration, as these do not provide sufficient power for inclusion in the meta-analysis. Studies that did not report the test statistic for congruence were also necessarily excluded. A short citation of each study was recorded under 'authors', and the year of publication was recorded in 'year'. Hosts and symbionts were classified broadly according to Linnean taxonomy for 'host_tax_broad' and 'symbiont_tax_broad' as either: invertebrate, vertebrate, plant or microbe (i.e. microscopic symbionts such as fungi, protozoa, bacteria, viruses). We adopted the mode of symbiosis and mode of transmission between host species specified by the authors in each individual study for 'symbiosis' and 'mode_of_transmission_broad'. In cases where either mode of symbiosis or mode of transmission were not directly specified by authors, we consulted the literature for clarification. In a small number of studies restricted to bacterial intracellular symbionts, the mutualism-parasitism distinction was not defined by the authors and either no further information was available, or a symbiont was cited in the literature as being both a mutualist or a parasite, depending on which study was considered. The nature of the relationship between bacterial intracellular symbionts and their hosts is complex, and in some cases they may display both beneficial and detrimental effects simultaneously. In a few cases of conflict or where authors did not explicitly state mode of transmission for bacterial intracellular symbionts, we assumed a mode of transmission in line with the majority of available references. We only encountered one study where authors categorised the mode of symbiosis as commensalism. On the continuum of symbioses from pure parasitism (fitness losses for the host) to mutualism (fitness gains for the host), commensalism represents a single point where losses and gains for the host precisely equal zero. Consequently, commensalism is an unlikely and unstable state, easily tipped to one side or the other with any small change in external conditions. Thus, the lack of widely recognized groups of commensals is the likeliest explanation for the scarcity of studies on commensalism in our data (note that we did not include this category, commensalism, in our analyses). The total number of host tips that were linked to a symbiont taxon were summed to provide 'host_tips_linked', which in a very few cases was corrected to remove multiple sampling of the same host species, to provide 'host_tips_linked_corrected'. The total number of symbiont tips with a link to a host taxon were summed to provide 'symbiont_tips_linked', while the total number of individual links between hosts and symbionts was recorded as 'total_host_symbiont_links'. If all symbionts in a phylogeny were strict specialists, such that each one had a single link to a single host, 'total_host_symbiont_links' would simply equal 'symbiont_tips_linked'. However, because symbionts are often associated with more than one host, the value of 'total_host_symbiont_links' was often higher than the total number of symbionts included in a study. Thus, a measure of symbiont generalism was captured using 'host_range_link_ratio', defined as 'total_host_symbiont_links' divided by 'symbiont_tips_linked', providing the mean number of host-symbiont links observed per symbiont taxon, with the measure increasing with increasing generalism. An alternative estimate of symbiont host specificity was captured using 'host_range_taxonomic_breadth', which considers Linnean taxonomic rank, and was calculated by assigning an incremental score to successive host taxonomic ranks per symbiont in turn (i.e. single host species = 1, multiple host species in the same genus = 2, multiple host genera = 3, multiple host families = 4, multiple host orders = 5), summing the total score across all symbionts, and dividing by 'symbiont_tips_linked' (i.e. the total number of symbionts). Consequently, 'host_range_taxonomic_breadth' increases with symbiont generalism, such that symbiont phylogenies containing symbionts capable of infecting hosts from a wide range of taxonomic ranks are assigned a greater score. The number of phylogenetic permutations performed by authors during cophylogenetic analyses was recorded as 'no_randomizations', which poses a unique problem in our meta-analysis (discussed in the section 'Publication bias and sensitivity analysis'). The resultant p value from each study was recorded as 'p_value', whereby observed p values decrease with a decreasing likelihood of observing host-symbiont cophylogeny by chance alone (i.e., as calculated during permutation tests performed by authors during TreeMap or ParaFit analyses). File '2021-09-01-source-data-dat.txt' is in tab-delimited text format. File 'Supporting_Information.Rmd' is accompanying R code used for analysis of the source data. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Dataset_for_ELE_EV_ELE13757/14393309/1
 
Title Dataset for ELE_EV_ELE13757 
Description Methods: A literature search was performed using Google Scholar on 19th March 2019, which identified 368 citations of the original paper for TreeMap (Page 1994), and 332 citations of the original paper for Parafit (Legendre et al. 2002), resulting in a total of 700 articles that were screened to extract metrics for inclusion in our meta-analysis. Articles that did not contain cophylogenetic analyses were immediately excluded. Studies focussing at the population level were also excluded, as these do not represent true cophylogenetic analyses at the macroevolutionary level. Additionally, studies that included less than four taxa were excluded from consideration, as these do not provide sufficient power for inclusion in the meta-analysis. Studies that did not report the test statistic for congruence were also necessarily excluded. A short citation of each study was recorded under 'authors', and the year of publication was recorded in 'year'. Hosts and symbionts were classified broadly according to Linnean taxonomy for 'host_tax_broad' and 'symbiont_tax_broad' as either: invertebrate, vertebrate, plant or microbe (i.e. microscopic symbionts such as fungi, protozoa, bacteria, viruses). We adopted the mode of symbiosis and mode of transmission between host species specified by the authors in each individual study for 'symbiosis' and 'mode_of_transmission_broad'. In cases where either mode of symbiosis or mode of transmission were not directly specified by authors, we consulted the literature for clarification. In a small number of studies restricted to bacterial intracellular symbionts, the mutualism-parasitism distinction was not defined by the authors and either no further information was available, or a symbiont was cited in the literature as being both a mutualist or a parasite, depending on which study was considered. The nature of the relationship between bacterial intracellular symbionts and their hosts is complex, and in some cases they may display both beneficial and detrimental effects simultaneously. In a few cases of conflict or where authors did not explicitly state mode of transmission for bacterial intracellular symbionts, we assumed a mode of transmission in line with the majority of available references. We only encountered one study where authors categorised the mode of symbiosis as commensalism. On the continuum of symbioses from pure parasitism (fitness losses for the host) to mutualism (fitness gains for the host), commensalism represents a single point where losses and gains for the host precisely equal zero. Consequently, commensalism is an unlikely and unstable state, easily tipped to one side or the other with any small change in external conditions. Thus, the lack of widely recognized groups of commensals is the likeliest explanation for the scarcity of studies on commensalism in our data (note that we did not include this category, commensalism, in our analyses). The total number of host tips that were linked to a symbiont taxon were summed to provide 'host_tips_linked', which in a very few cases was corrected to remove multiple sampling of the same host species, to provide 'host_tips_linked_corrected'. The total number of symbiont tips with a link to a host taxon were summed to provide 'symbiont_tips_linked', while the total number of individual links between hosts and symbionts was recorded as 'total_host_symbiont_links'. If all symbionts in a phylogeny were strict specialists, such that each one had a single link to a single host, 'total_host_symbiont_links' would simply equal 'symbiont_tips_linked'. However, because symbionts are often associated with more than one host, the value of 'total_host_symbiont_links' was often higher than the total number of symbionts included in a study. Thus, a measure of symbiont generalism was captured using 'host_range_link_ratio', defined as 'total_host_symbiont_links' divided by 'symbiont_tips_linked', providing the mean number of host-symbiont links observed per symbiont taxon, with the measure increasing with increasing generalism. An alternative estimate of symbiont host specificity was captured using 'host_range_taxonomic_breadth', which considers Linnean taxonomic rank, and was calculated by assigning an incremental score to successive host taxonomic ranks per symbiont in turn (i.e. single host species = 1, multiple host species in the same genus = 2, multiple host genera = 3, multiple host families = 4, multiple host orders = 5), summing the total score across all symbionts, and dividing by 'symbiont_tips_linked' (i.e. the total number of symbionts). Consequently, 'host_range_taxonomic_breadth' increases with symbiont generalism, such that symbiont phylogenies containing symbionts capable of infecting hosts from a wide range of taxonomic ranks are assigned a greater score. The number of phylogenetic permutations performed by authors during cophylogenetic analyses was recorded as 'no_randomizations', which poses a unique problem in our meta-analysis (discussed in the section 'Publication bias and sensitivity analysis'). The resultant p value from each study was recorded as 'p_value', whereby observed p values decrease with a decreasing likelihood of observing host-symbiont cophylogeny by chance alone (i.e., as calculated during permutation tests performed by authors during TreeMap or ParaFit analyses). File '2021-09-01-source-data-dat.txt' is in tab-delimited text format.File 'Supporting_Information.Rmd' is accompanying R code used for analysis of the source data. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Dataset_for_ELE_EV_ELE13757/14393309
 
Title Genome of the four-finger threadfin Eleutheronema tetradactylum (Perciforms: Polynemidae) 
Description Teleost fish play important roles in aquatic ecosystems and aquaculture. Here, we sequenced and assembled the genome of the first threadfin fish, the fourfinger threadfin Eleutheronema tetradactylum (Perciformes: Polynemidae). Threadfins show a range of interesting biology, and are of considerable importance in both wild fisheries and aquaculture. Additionally, E. tetradactylum is of conservation relevance since its populations are considered to be in rapid decline and it is currently classified as endangered. We provide a genome assembly for E. tetradactylum with high contiguity (scaffold N50 = 56.3 kb) and high BUSCO completeness at 96.5%. The assembled genome size of E. tetradactylum is just 610.5 Mb, making it the second smallest perciform genome assembled to date, and only ~9.68% of the sequence was found to consist of repetitive elements, making this the lowest repeat content identified to date for any perciform fish. A total of 37,683 protein-coding genes were annotated, including analyses of developmental transcription factors, including Hox, ParaHox, and Sox families. MicroRNA genes were also annotated and compared with other chordate lineages, elucidating the gains and losses of chordate microRNAs. Our findings provide a useful genomic resource for future research into the interesting biology and evolution of this important group food fish. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Genome_of_the_four-finger_threadfin_Eleutheronema_tetradactylu...
 
Title Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome 
Description Abstract Background Homeobox-containing genes encode crucial transcription factors in animal and plant development, and changes to these genes have been linked to the evolution of body plans and morphologies. In animals some homeobox genes are clustered in the genome, due to either coordinated gene regulation or as remnants from ancestral genomic arrangements. Analyses of homeobox gene organization across a range of species thus helps to better understand the evolution of genome organization and developmental gene control, and the possible interactions between the two. However, homeobox gene organization in several key animal ancestors, including those of molluscs, lophotrochozoans and bilaterians, remains to be fully elucidated. Results Here, we present a high-quality chromosome-level genome assembly of Magallana hongkongensis (2n = 20), for which 93.2% of the genomic sequences are contained on 10 pseudomolecules (~758Mb, scaffold N50 = 72.3Mb). A total of 46,963 predicted gene models (45,308 protein coding genes) were incorporated for this genome, and genome completeness estimated by BUSCO was 94.6%. Homeobox gene linkages were analysed in detail relative to available data in other mollusc species. Our chromosome-level assembly allows the inference of ancient gene linkages (synteny) for the homeobox-containing genes, even though a number of the homeobox gene clusters, like the Hox/ParaHox clusters, are undergoing dispersal in molluscs such as this oyster. Conclusions The analyses performed in this study and the accompanying genome sequence provide important genetic resources for this economically and culturally valuable oyster species, and offer a platform to improve understanding of animal biology and evolution more generally. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Reconstruction_of_ancient_homeobox_gene_linkages_inferred_from...
 
Title Supporting data for "A draft genome sequence of the elusive giant squid, Architeuthis dux" 
Description The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusk with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea dwelling species will allow unlocking several pending evolutionary questions. We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid species (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein coding genes supported by evidence and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL http://gigadb.org/dataset/100676
 
Description Collaboration with 10X Genomics 
Organisation 10X Genomics, Inc
Country United States 
Sector Private 
PI Contribution We provided specimens
Collaborator Contribution They arranged for library preparations to be made ahead of the queue and for free, and subsequently assembled the resultant data.
Impact Four butterfly draft genome sequences
Start Year 2017
 
Description Collaboration with Charlie Cornwallis at the University of Lund 
Organisation Lund University
Country Sweden 
Sector Academic/University 
PI Contribution Annotation of transposable elements in genomes sampled from across algal diversity
Collaborator Contribution Sequencing a large number of algal genomes
Impact Comparative genomics paper planned
Start Year 2020
 
Description Collaboration with Dr Jon Mulley at Bangor University 
Organisation Bangor University
Country United Kingdom 
Sector Academic/University 
PI Contribution Collaborations on gerbil genome evolution (submitted), and snake genome evolution (in prep.)
Collaborator Contribution Comparative transposon analyses
Impact One preprint (submitted to a journal), one manuscript in prep.
Start Year 2022
 
Description Collaboration with Dr Pablo Orozco Ter Wengel at Cardiff University 
Organisation Cardiff University
Country United Kingdom 
Sector Academic/University 
PI Contribution I coordinated the BBSRC DTP appllication.
Collaborator Contribution Dr Ter Wengel is the second supervisor of my BBSRC DTP student Ryan Biscocho. He will assist in analyses of the role of transposons in livestock domestication.
Impact Planned: Biscocho ER, Baril T, Orozco-Terwengel P, Hui JHL, Ferrier DEK, Hayward A. (In preparation for Molecular Biology and Evolution) The influence of transposable elements on Hox gene evolution in molluscs
Start Year 2020
 
Description Collaboration with Professor Chris Bass at Exeter University 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Professor Bass and myself collaborate on questions relating to the evolution of insecticide resistance in insects, notably specifically in relation to aphids and the role that transposable elements play in this capacity.
Collaborator Contribution Planning and conducting the majority of the research
Impact Published: ----Dupeyron M, Singh KS, Bass C, Hayward A (2019) Evolution of Mutator transposable elements across eukaryotic diversity. Mobile DNA, 10, 12. ----Dupeyron M, Baril T, Bass C, Hayward A (2020) An evolutionary analysis of the Tc1-mariner superfamily reveals the unexplored diversity of pogo-like elements. MobileDNA, 11, 21. ----Singh KS, Troczka BJ, Duarte A, Balabanidou V, Trissi N, Paladino LZC, Nguyen P, Zimmer CT, Papapostolou K, Randall E, Mallott V, Marec F, Mazzoni E, Williamson M, Hayward A, Nauen R, Vontas J, Bass C (2020) The genetic architecture of a host shift: an adaptive walk protected an aphid and its endosymbiont from plant chemical defences. Science Advances, 6, eaba1070. ----In review: ----Singh KS, Cordeiro EMG, Troczka BJ, Pym A, Mackisack J, Mathers TC, Duarte A, Legeai F, Robin S, Bielza P, Burrack HJ, Charaabi K, Denholm I, Figueroa CC, ffrench-Constant RH, Jander G, Margaritopoulos JT, Mazzoni E, Nauen R, Ren G, Stepanyan I, Umina PA, Voronova NV, Vontas J, Williamson M, Wilson ACC, Xi-Wu G, Youn Y-N, Zimmer CT, Simon J-C, Hayward A, Bass C (In resubmission at Communications Biology) Global patterns in genomic diversity reveal the molecular and ecological processes underpinning the evolution of insecticide resistance in the crop pest Myzus persicae. ----Panini M, Chiesa O, Troczka BJ, Mallott M, Manicardi GC, Cassanelli S, Cominelli F, Hayward A, Mazzoni E, Bass C (Submitted to PNAS) Silencing susceptibility: transposon-mediated insertional mutagenesis unmasks recessive insecticide resistance. ----In preparation: ----Troczka BJ, Hayward A, Bass C (In preparation for Pest Management Science) Molecular innovations underlying resistance to natural and synthetic xenobiotics in Myzus persicae. ----Planned: ----Baril T, Bass C, Hayward A (In preparation for Molecular Biology and Evolution) Population genomics of transposable elements for 100 aphid genomes. ----Baril T, Singh KS, Bass C, Hayward A. A comparative analysis of aphid transposable elements: the impact of transposons on the evolution of host processes under strong selective pressure.
Start Year 2017
 
Description Collaboration with Professor Juan Antonio Balbuena and Dr Isa Blasco at the University of Valencia and Muséum d'histoire naturelle Genève 
Organisation Natural History Museum of Geneva
Country Switzerland 
Sector Public 
PI Contribution I contributed to writing of the review manuscript
Collaborator Contribution My partners led on this review project, securing an invitation to submit to the prestigious journal Trends in Ecology and Evolution
Impact In preparation: Blasco-Costa I, Hayward A, Poulin R, Balbuena JA (In preparation for Trends in Ecology and Evolution) Next-generation cophylogeny: integrating eco-evolutionary interactions.
Start Year 2020
 
Description Collaboration with Professor Juan Antonio Balbuena and Dr Isa Blasco at the University of Valencia and Muséum d'histoire naturelle Genève 
Organisation University of Valencia
Country Spain 
Sector Academic/University 
PI Contribution I contributed to writing of the review manuscript
Collaborator Contribution My partners led on this review project, securing an invitation to submit to the prestigious journal Trends in Ecology and Evolution
Impact In preparation: Blasco-Costa I, Hayward A, Poulin R, Balbuena JA (In preparation for Trends in Ecology and Evolution) Next-generation cophylogeny: integrating eco-evolutionary interactions.
Start Year 2020
 
Description Collaboration with Professor Robert Poulin and Professor Shinichi Nakagawa at University of Otago and University of New South Wales 
Organisation University of New South Wales
Country Australia 
Sector Academic/University 
PI Contribution I collected the data, contributed to study design, and wrote the manuscript.
Collaborator Contribution Project design, meta-analysis
Impact Hayward A, Poulin R, Nakagawa S (In resubmission at Ecology Letters) A broadscale test of host-symbiont cophylogeny reveals the key drivers of phylogenetic congruence. This is the first ever quantitative evaluation of the extent to which symbionts codiverge with their hosts, which is a major mechanism underlying global biodiversity, with general connotations for host-pathogen evolution, such as host-shifts.
Start Year 2013
 
Description Collaboration with Professor Robert Poulin and Professor Shinichi Nakagawa at University of Otago and University of New South Wales 
Organisation University of Otago
Country New Zealand 
Sector Academic/University 
PI Contribution I collected the data, contributed to study design, and wrote the manuscript.
Collaborator Contribution Project design, meta-analysis
Impact Hayward A, Poulin R, Nakagawa S (In resubmission at Ecology Letters) A broadscale test of host-symbiont cophylogeny reveals the key drivers of phylogenetic congruence. This is the first ever quantitative evaluation of the extent to which symbionts codiverge with their hosts, which is a major mechanism underlying global biodiversity, with general connotations for host-pathogen evolution, such as host-shifts.
Start Year 2013
 
Description Collaboration with the Wellcome Sanger Institute via the Darwin Tree of Life Project 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Provision of butterfly samples, advice for genomic sequencing, and analysis of transposons in butterfly genomes, as part of the wider Darwin Tree of Life initiative.
Collaborator Contribution Wellcome Sanger Institute, via Professor Mark Blaxter, are sequencing very high quality British butterfly and moth genomes. UPDATE 2021: I participated in the DToL Lep2020 group meeting, providing an oral presentation on methods for repeat annotation and analysis in Lepidoptera. I am expecting to provide the repeat anaysis for the flagship Lepidoptera 100 genome study, that will lead results from the Darwin Tree of Life Initiative.
Impact -Open access butterfly genomes for the scientific community. -significant increase in transposon analyses for butterfly genomes. UPDATE 2021: A publication on the Lepidoptera 100 genomes project is planned, with likely companion papers exploring specific aspects, such as detailed repeat analyses.
Start Year 2019
 
Description Collabortion with Professor Jerome Hui at Chinese University of Hong Kong 
Organisation Chinese University of Hong Kong
Country Hong Kong 
Sector Academic/University 
PI Contribution We provide guidance, project contributions, and transposon analysis to genome projects. UPDATED 2021: This collaboration remains active and we are currently collaborating on genome projects involving multiple myriapod species, gastropods, and butterflies.
Collaborator Contribution Professor Hui's group are sequencing a large number of genomes to very high quality, which are ideal for transposon analyses.
Impact Published: ----Nong W, Law STS, Wong AYP, Baril T, Swale T, Chu LM, Hayward A, Lau DTW, Hui JHL (2020) A chromosomal-level reference genome of the incense tree Aquilaria sinensis. Molecular Ecology Resources, 20, 971-979. ----Li Y, Nong W, Baril T, Yip HY, Swale T, Hayward A, Ferrier DEK, Hui JHL (2020) Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome. BMC Genomics, 21, 713. ----Nong W, Qu Z, Li Y, Barton-Owen T, Wong AYP, Yip HY, Lee HT, Narayana S, Baril T, Swale T, Cao J, Chan TF, Kwan HS, Ming NS, Panagiotou G, Qian P, Qiu J, Yip KY, Ismail N, Pati S, John A, Tobe SS, Bendena WG, Cheung SG, Hayward A, Hui JHL (2021) Horseshoe crab genomes reveal evolutionary fates of genes and microRNAs after three rounds (3R) of whole genome duplication in invertebrates. ----Qu Z, Nong W, Yu Y, Baril T, Yip HY, Hayward A, Hui JHL (2020) Genome of the four-finger threadfin Eleutheronema tetradactylum (Perciforms: Polynemidae). BMC Genomics, 21, 726. ----Qu Z, Nong W, So HWL, Barton-Owen T, Li Y, Li C, Leung TCN, Baril T, Wong AYP, Swale T, Chan TF, Hayward A, Ngai SM, Hui JHL (2020) Millipede genomes reveal unique adaptations during myriapod evolution. PLoS Biology, 18 (9), e3000636. UPDATE 2021: At least three further publications are planned this year.
Start Year 2019
 
Description Primary collaboration with Dr Konrad Lohse at the University of Edinburgh 
Organisation University of Edinburgh
Department Institute of Evolutionary Biology
Country United Kingdom 
Sector Academic/University 
PI Contribution I sequenced 2 butterfly genomes using the PacBio Sequal platform. I also arranged for free library preparations and assembly for 4 butterfly species using 10X Genomics. We shared fieldwork efforts during the collection of butterfly specimens in Iberia.
Collaborator Contribution Konrad sequenced the 10X library preparations on the Illumina platform in Edinburgh, has generated transcriptomes for 66 of our target butterfly species, shared fieldwork efforts, and has a new postdoc who is assembling our PacBio data. UPDATE 2021: This collaboration remains active and we are planning several further papers during 2021.
Impact Four new butterfly genomes sequenced using 10X genomics, two of which have also been sequenced using PacBio Sequel. 66 new butterfly transcriptomes. ----Published: ----Mackintosh A, Laetsch DR, Hayward A, Charlesworth B, Waterfall M, Vila R, Lohse K (2019) The determinants of genetic diversity in butterflies - Lewontin's paradox revisited. Nature Communications, 10, 3466. UPDATE 2021: ----In resubmission: ----Ebdon S, Laetsch D, Dapporto L, Hayward A, Ritchie MG, Dinca V, Vila R, Lohse K (In resubmission at Molecular Ecology) The Pleistocene species pump past its prime: evidence from European butterfly sister species. ----In preparation: ----Baril T, Laetsch DR, Mackintosh A, Vila R, Lohse K, Hayward A. (In preparation for Nature Ecology and Evolution) Host-TE interactions across 50 high-quality butterfly genomes. ----Genome release papers for 20 high quality genomes generated in collaboration. ----Planned: ----Population genetic analyses for at least one system species pair, ----repeat/introgression analyses across phylogeny.
Start Year 2017
 
Title TobyBaril/EarlGrey: Earl Grey v1.2 
Description For those that cannot install RepeatModeler and RepeatMasker on their systems, we now provide a Docker container (with instructions) that will enable Earl Grey to run within a virtual environment. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact There have been hundreds of downloads of the pipeline from its GitHub site. 
URL https://zenodo.org/record/5718734
 
Description Genomics outreach activity at the Royal Cornwall Show 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact I ran a practical activity where members of the public were offered the chance to extract DNA from Cornish strawberries. During the activity, the participants were told interesting facts about DNA and informed about the value of genomics research. Over 1,000 members of the public participated over 3 days. The participants were mainly school children between the ages of 5-16, and their teachers/guardians. However, other adult members of the public also participated. A team of six volunteers helped me with the demonstration.
Year(s) Of Engagement Activity 2017
 
Description International workshop - Butterflies as genomic models in ecology and evolution 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I held a three day international workshop on butterfly genomics at my campus. This was attended by top international researchers in the field, as well as postgraduate students, and undergraduate students from my institution. Additionally, members of the NGO Butterfly Conservation participated, and members of the general public also attended some sessions. The intended purpose was networking and for me to gain an introduction to the field of butterfly genomics. Lastly, several representatives from major sequencing companies attended. There was much discussion as a result of the workshop, and many new collaborations discussed among the participants. It also led to the company 10X Genomics offering to provide library preparations and assemble 4 of my focal butterfly species for free.
Year(s) Of Engagement Activity 2017
URL http://www.exeter.ac.uk/news/events/details/index.php?event=7032
 
Description Invited presentation at Cornwall Wildlife Trust meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Supporters
Results and Impact Invited oral presentation on "Environmental DNA: a new technique to survey Cornish marine biodiversity?" at the annual Cornwall Wildlife Trust Marine Recorders Evening.
Year(s) Of Engagement Activity 2018
 
Description Poster presentation at marine conservation event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact MSc student poster presentation: 'Environmental DNA (eDNA): a novel approach to survey elasmobranchs in Cornish seas'
Year(s) Of Engagement Activity 2018