📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

CLIMB-BIG-DATA: A Cloud Infrastructure for Big-Data Microbial Bioinformatics

Lead Research Organisation: QUADRAM INSTITUTE BIOSCIENCE
Department Name: Microbes in the Food Chain

Abstract

High-throughput sequencing has transformed microbiology, delivering an explosion in genomic and metagenomic big data. However, many microbiologists remain unable to exploit large genomics datasets to address key questions in microbiology, because they lack access to the relevant computational resources, bioinformatics tools or expertise in data analysis. To address this problem, six years ago we launched CLIMB, a pioneering British cloud-computing infrastructure project funded by the MRC that has supported >900 users. As CLIMB comes to an end, we propose a unique new partnership--CLIMB-BIG-DATA (Cloud Infrastructure for Big-Data Microbial Bioinformatics)- to meet the bioinformatic needs of the UK microbiology community as we head into the 2020s. This new CLIMB-BIG-DATA partnership will occupy a distinctive position in the UK, underpinning research in the academic sector alongside the front-line work of government agencies and the health service, while also supporting research that maps on to a wide variety of national/UKRI and international strategic priorities and Official Development Assistance objectives.

In response to community needs (as evidenced by >160 signatories), the proposed partnership will maintain the existing CLIMB infrastructure to support hundreds of research projects including high-profile efforts to track the spread of Ebola or Zika virus. However, we also promise to deliver a step-change in the scale and scope of what we can offer to users. We will adopt a matrix model, in which a range of activities will be mapped on to strategically important themes championed by our investigators, including Antimicrobial Resistance; Emerging Infectious Disease and Global Health; Microbial Genomics for Public Health; Microbial communities and metagenomics; Pathogen Biology and Functional Genomics; Sequencing Technologies.

Activities aimed at community engagement will include bioinformatics workshops, hackathons and symposia. Activities focused on tools and integration will include enhanced support for sharing software and data, workflow integration and migration between clouds; enhanced support and security for clinical applications; plus integration with large datasets at external facilities, such as the European Nucleotide Archive. Activities focused on infrastructure include: provision of graphics processing units and enhanced storage; maintenance of our original cloud-computing infrastructure to support microbial bioinformatics; plus incorporation of cloud infrastructures from the MRC unit in the Gambia and from the Quadram Institute. The CLIMB-BIG-DATA partnership will run as a UKRI-supported project for five years, with the expectation that the project will become self-financing through robust pathways to sustainability and expansion. The partnership will draw upon a diverse team of partners from multiple research organisations and collaborators from government agencies, and it will be run from the Quadram Institute in Norwich, which as a strategically funded UKRI research institute will provide a first-class stable and resilient environment for the project's future.

Technical Summary

The CLIMB-BIG-DATA partnership will provide a substantial computational resource that will enhance UK capability and infrastructure in microbial bioinformatics, building on our highly successful CLIMB project. Our computational infrastructure will feature an OpenStack cloud architecture with >10000 virtual CPU cores spanning six research organisations (incorporating clouds from the MRC unit Gambia and the Quadram), with access to the CEPH platform to implement object storage. A dedicated web portal Bryn will allow users to gain easy access to their own virtual machines, preconfigured with powerful user-friendly bioinformatics tools. We will add newly requisitioned specialised servers aimed at memory-intensive tasks (e.g. metagenomic assembly) or compute-intensive tasks (e.g. GPU nanopore analyses) and we will add substantial additional storage (>3 petabytes). Other features will include a freely accessible database of relevant workflows, pipelines, scripts, programs, preconfigured virtual machine images and containers, curated to support strategically relevant themed activities; an accreditation-compliant computational infrastructure for linking sensitive human and animal health metadata with microbial sequence data; support for containerisation via the Docker Engine and Singularity; a capability to share VMs, containers, data and software across the entire CLIMB-BIG-DATA infrastructure and with public cloud providers (with cloud bursting on to public clouds, should demand spike on our own infrastructure). We also promise an ambitious and exciting programme of training/community engagement, featuring hackathons, workshops, and modules suitable for a wide range of users from undergraduate students to professional bioinformaticians in the UK and more widely. We will build protocols for demand management and for charging users as we move towards becoming self-sustainable and will also improve integration with public facilities and new potential partner sites.

Planned Impact

This research will be of benefit to a range of beneficiaries outside of academic disciplines that take in microbiology:

1. Clinical and veterinary microbiologists, vets and governmental organisations such as APHA, FSA and local agencies including health services such as PHE/PHW who have a role in tracking zoonotic disease and tracing pathogens through the food chain. These users will be able to use our computational infrastructure to integrate informatics systems, animal and human health metadata, epidemiological disease patterns and microbial (meta)genomic data to elucidate modes and routes of transmission, detect outbreaks, explore the relationships between potential pathogens and disease, with impacts on animal health, welfare, and disease prevention. This system will also provide an infrastructure that will bring new opportunities for productive engagement between organisations focused on animal health and the academic sector, so that research findings and approaches can be more easily translated into outcomes that impact food security and human and animal health.

2. Industrial users stand to benefit in several ways. The tools around the characterisation and development of novel antimicrobials and metagenomics are of wide interest to industrial beneficiaries as these tools will be invaluable for the identification of new targets and the rational design of probiotic treatments for the prevention of microbial disease in farmed animals. Industrial users will also benefit from the tools and data that the infrastructure will make available. These will allow the rapid contextualisation and characterisation of bacteria of industrial importance (for example in product spoilage), information that can then be used to design interventions or better optimise preservative selection.

3. Commercial beneficiaries include sequencing companies, computer companies and private laboratories, who stand to benefit from increased demand for their products and opportunities for innovation and spread of best practice (NB: both Solexa and Oxford Nanopore Sequencing were developed within the UK, with benefits to our economy).

4. Anyone planning a large cloud-based computing project will be able to draw on the example and precedent we set here.

5. Policy makers, who will benefit from grounding their public policy and legislation, e.g. on food safety or pandemic preparedness, on a more solid understanding of bacterial evolution, epidemiology, population genetics and taxonomy. .

6. The wider public will benefit from the positive impacts on food security, reduced preservative use, and increased profitability of UK companies, resulting in stronger tax revenues for the UK.

This work will also make a decisive contribution through employment and training to enhancing the professional and research skills base of the United Kingdom, contributing to the development of the knowledge economy through the training of undergraduates and postgraduates in data intensive research techniques, using the common CLIMB-BIG-DATA platform.

Publications

10 25 50
 
Title Additional file 2 of Hybrid assembly of an agricultural slurry virome reveals a diverse and stable community with the potential to alter the metabolism and virulence of veterinary pathogens 
Description Additional file 1: Supplementary Table 1. ViromeQC enrichment scores of Illumina viromes. Supplementary Table 2. Genes found to be under positive selection and their predicted function. Supplementary Table 3. Predicted functions of putative phage-encoded AMGs. Supplementary Table 4. Predicted host taxa for vOTUs. Supplementary Table 5. Relative abundance of vOTUs in each sample, alongside vConTACT2 cluster and predicted lifestyle. Supplementary Table 6. Mapping statistics for active prophage vOTUs that were used to infer the ends of prophage sequences. Supplementary Table 7. Mapping statistics for active prophage vOTUs that were used to infer the ends of prophage sequences for which at least one end could be predicted. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Hybrid_assembly_of_an_agri...
 
Title Alignment files for coverage benchmarks: Illumina and Nanopore sequencing datasets 
Description cpara-illumina-noseq.bam and cpara-ont-noseq.bam: BAM files produced aligning the raw reads produced respectively by Illumina NextSeq and ONT Nanopore sequencing of an isolate of C. parapsilosis to evaluate the coverage calculations using real datasets.* HG00258.bam: Exome sequencing from the 1000 Genomes Project (Clarke et al 2016 https://doi.org/10.1093/nar/gkw829). panel_01.bam: targeted sequencing of a Human gene panel of 16 genes.* * Sequences and qualities have been removed 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://zenodo.org/record/5636943
 
Title Control samples for ITS1 Metabarcoding of the Cynomolgus Macaque Intestinal Mycobiome 
Description Library and Sequencing controls used for the Metabarcoding ITS1 analysis of intestinal content of the Cynomolgus Macaque 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact ITS1 sequencing is widely adopted for the analysis of fungal communities, yet the majority of metabarcoding tutorials are based on the 16S marker. This dataset was used to deliver ITS specific training, focusing on the sequencing controls to adopt in such studies. 
URL https://zenodo.org/record/6881353
 
Title DNA and RNA viruses in the rhizosphere 
Description This repository contains data used in Muscatt et al. 2022 Futher details on analysis can be found here https://github.com/GeorgeMuscatt/RhizosphereVirome Data is stored in the file RhizosphereVirome.tar The following files are stored. See the README for full details: c1.ntw.gz = vConTACT2 network output file core_protein_concatenation_tree = ssRNA phage phylogenetic tree based on aligned core protein concatenations CP.faa.gz = fasta amino acid file containing coat protein sequences for 11,222 near-complete ssRNA phage vOTUs CP_ref_Leviviricetes.faa.gz = fasta amino acid file containing coat protein sequences for 1,868 reference Leviviricetes genomes dsDNA_gene_annotations.csv.gz = annotations for 20,746 dsDNA vOTU genes dsDNA_vOTUs.faa.gz = fasta amino acid file containing 20,267 dsDNA vOTU genes dsDNA_vOTUs.fna.gz = fasta nucleotide file containing 1,059 dsDNA vOTUs edges.csv.gz = edges for drawing vConTACT2 network gene_2_genome.csv.gz = input file for vConTACT2 containing gene-to-genome index for 1,059 dsDNA vOTUs and 16,540 ssRNA phage vOTUs MP.faa.gz = fasta amino acid file containing maturation protein sequences for 11,222 near-complete ssRNA phage vOTUs MP_ref_Leviviricetes.faa.gz = fasta amino acid file containing maturation protein sequences for 1,868 reference Leviviricetes genomes nodes.csv.gz = nodes for drawing vConTACT2 network RdRp.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 11,222 near-complete ssRNA phage vOTUs RdRp_ref_Leviviricetes.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 1,868 reference Leviviricetes genomes ssRNA_vOTUs.faa.gz = fasta amino acid file containing 52,700 ssRNA phage vOTU genes ssRNA_vOTUs.fna.gz = fasta nucleotide file containing 16,541 ssRNA phage vOTUs viral_cluster_overview.csv = output file from vConTACT2 containing viral cluster information 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://leicester.figshare.com/articles/dataset/DNA_and_RNA_viruses_in_the_rhizosphere/19635336
 
Title Dataset to test the Nextflow Tutorial 
Description Tutorial: https://telatin.github.io/microbiome-bioinformatics/Nextflow-start/ Repository: https://github.com/telatin/nextflow-example 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Using CLIMB infrastructure, we delivered the first DSL2-native tutorial for Nextflow, using this dataset to build an example pipeline 
URL https://zenodo.org/record/5931662
 
Title De novo sequencing of phages T4 and T7 
Description Raw data and assemblies of the de novo sequencing of the "model" phages T4 and T7, performed with Illumina NextSeq 2x150 and Oxford Nanopore, generated for the "Phage Annotation Workshop" held online on November 2021: https://github.com/quadram-institute-bioscience/phage-annotation-workshop/ 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Two widely known genomes sequenced de novo to provide valuable training datasets for bioinformatics tutorials and phage annotation workshops 
URL https://zenodo.org/record/5704419
 
Title Host removal database: Homo sapiens, Sars-Cov-2, PhiX174 
Description cleanup-db Kraken2 database, built upon a viral sequence masked human reference from: Handley, Scott A. (2020). Virus+ Sequence Masked Human Reference Genome (hg19) (1.0) [Data set]. Zenodo. [10.5281/zenodo.4116107] but separating chromosomes as artificial taxa to allow for QC, and includes Sars-Cov-2 and PhiX 174 gutcheck-db A very small DB containg some common gut bacteria and Human and Murine mitochondrial genome: Akkermansia muciniphila Bacteroides fragilis Bifidobacterium longum Blautia obeum strain Escherichia coli Enterococcus faecium Prevotella copri See: https://github.com/telatin/cleanup 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact custom database to remove common contaminants from metagenomics samples (Homo, Sars-Cov-2 and PhiX-174); by splitting the human genome in individual chromosomes, this database enable a quality check against reads wrognly classified as human. 
URL https://zenodo.org/record/7050266
 
Title INPHARED_DATABASE 
Description inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible.Useful information, including viral taxonomy and bacterial host data, is extracted from the Genbank files and provided in a summary table. Genes are called on the genomes using Prokka and this output is used to gather metrics which are summarised in the output files, as well as useful input files for vConTACT2. The data provided is all genomes up to Jan 2021. This can be downloaded so users do not have to repeat the process of consistent gene calling on existing genomes. The folder GenomesDB contains subfolders each containing a subfolder that is named on the accession number of each phage. Within each folder are re-called genes in the following format *.ffn*.faa The complete genome *fna and genbank file without any annotation *gbf See https://github.com/RyanCook94/ 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://leicester.figshare.com/articles/dataset/INPHARED_DATABASE/14242085
 
Title Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance 
Description Supplementary Material for 'Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance', as published in Microbial Genomics. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://microbiology.figshare.com/articles/dataset/Local_accessory_gene_sharing_among_Egyptian_Campy...
 
Title MetaPhage Example Report 
Description Example report generated by MetaPhage (https://github.com/MattiaPandolfoVR/MetaPhage) as described in the paper. MetaPhage is a reads to report pipeline embedding viral miners and custom tools to generate automatic diversity plots. Pipeline documentation: https://mattiapandolfovr.github.io/MetaPhage/ 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/MetaPhage_Example_Report/20424705
 
Title MetaPhage Example Report 
Description Example report generated by MetaPhage (https://github.com/MattiaPandolfoVR/MetaPhage) as described in the paper. MetaPhage is a reads to report pipeline embedding viral miners and custom tools to generate automatic diversity plots. Pipeline documentation: https://mattiapandolfovr.github.io/MetaPhage/ 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/MetaPhage_Example_Report/20424705/1
 
Title Dadaist2 v1.0.0 
Description Dadaist2: highway to R Documentation: https://quadram-institute-bioscience.github.io/dadaist2/ Repository: https://github.com/quadram-institute-bioscience/dadaist2 Standalone wrapper for DADA2 package, to quickly generate a feature table and a set of representative sequences from a folder with Paired End Illumina reads. Dadaist2 is designed to simplify the stream of data from the read processing to the statistical analysis and plots. Dadaist2 is a highway to downstream analyses: Generation of a PhyloSeq object, for immediate usage in R Possibility to run in the pipeline a custom R script that starts from the PhyloSeq object Generation of MicrobiomeAnalyst-compatible files. MicrobiomeAnalyst provides a web-interface to performgi a broad range of visualizations and analyses. Generation of Rhea-compatible files. Rhea is a standardized set of scripts "designed to help easy implementation by users". In addition to this, Dadaist: Can automatically detect quality boundaries or trim the primers Has a custom mode for variable length amplicons (i.e. ITS), to detect features longer than the sum of the paired-end reads. Ships an open source implementation of UNCROSS2 by Robert Edgar. Has a modular design that allows recycling parts of it in custom workflows. Prepares a MultiQC-enabled overview of the experiment Produces an easy to inspect HTML execution log 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
URL https://zenodo.org/record/4761407
 
Title Repository for the Phage Genome Annotation Workshop 
Description Technical repository with the files and instructions for the Phage Genome Annotation workshop, performed on CLIMB notebooks 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Used by the participants to the workshop, but also widely downloaded afterwards thanks to the sharing on social media. The repository received six stars, 
URL https://github.com/quadram-institute-bioscience/phage-annotation-workshop/
 
Title SeqFu - Fastx Sequence Utilities 1.0 
Description SeqFu 1.0 A general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. SeqFu is available for Linux and MacOS. Documentation: https://telatin.github.io/seqfu2/ Repository: https://github.com/telatin/seqfu2 Paper Bioengineering 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
URL https://zenodo.org/record/4684490
 
Title SeqFu - Fastx Sequence Utilities 1.0 
Description SeqFu 1.0 A general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. SeqFu is available for Linux and MacOS. Documentation: https://telatin.github.io/seqfu2/ Repository: https://github.com/telatin/seqfu2 Paper Bioengineering 
Type Of Technology Software 
Year Produced 2021 
Impact High performance management of FASTQ files, the widely adopted output of any DNA sequencing machine 
URL https://zenodo.org/record/4740106
 
Description 02FEB2022 - CLIMB-BD workshop: Bioinformatics Skills for Microbial Genomics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Bioinformatics rely on a vast number of tools (packages, electronic notebooks, programming languages and their libraries) that bioinformaticians need to be able to install, manage and run. A growing challenge is represented by the organisation of data inputs and outputs - particularly as genomic datasets continue to expand.
This one-day training workshop introduced key concepts and working modalities that address these challenges, which are rapidly being adopted in the industry, including:
-Using containers (such as Docker and Singularity) - currently the easiest method for managing and deploying software, easier sharing of code, and higher reproducibility of the pipelines.
-Workflow languages (Nextflow DSL2) - workflow managers provide a framework for running analyses. They intrinsically provide a degree of data provenance and are easy to re-run analyses with different datasets or parameters in a range of computing environments.
-GNU/Linux command-line
Year(s) Of Engagement Activity 2022
URL https://www.climb.ac.uk/bioinformatics-skills-microbial-genomics/
 
Description 11-13 OCT 2021 - AMR HACKATHON (Bioinformatics tools and methods for AMR in bacteria) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Together with the Public Health Alliance for Genomic Epidemiology (PHA4GE) and the Joint Programming Initiative on Antimicrobial Resistance (JPIAMR), we organised the 7th Microbial Bioinformatics Hackathon with a special focus on Antimicrobial Resistance.

Antimicrobial resistance is a critical universal issue and scientists need reliable, fast, reproducible tools for their research. The aim of this hackathon was to improve upon/build/extend bioinformatics tools and methods for the AMR community. The hackathon has a special focus on antimicrobial resistance in bacteria.

We brought together international bioinformatics researchers, scientists and clinicians to collaborate and solve common problems that impact our community, as pathogens know no borders.
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/amr-hackathon/
 
Description 14-15JAN2022 - ARTICnetwork and CLIMB-BIG-DATA workshop on Covid-19 data analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The third joint ARTICnetwork and CLIMB-BIG-DATA workshop on COVID-19 data analysis was held on 13-14 January 2022. We focused on a mixture of talks, panel discussions, practical hands-on, and questions & answers sessions.

Introduction to the ARTIC&CLIMB-BIG-DATA workshop and the ARTIC project (Nick Loman)
Case studies in molecular epidemiology of SARS-CoV-2 (Andrew Page)
How to sequence COVID-19 using the ARTIC protocol (Josh Quick)
Automating the ARTIC protocol using the OpenTrons OT2 (Jeremy Mirza)
ARTIC Nanopore+Illumina Bioinformatics Pipeline (Sam Wilkinson)
Bioinformatics "gotchas" (Nabil-Fareed Alikhan)
DNA Spike-Ins for SARS-CoV-2 sequencing (Katherine Siddle)
Wastewater sequencing bioinformatics (Chris Quince)
A practical introduction to phylogenetics in the pandemic era (JT McCrone)
Practical phylogenetics: lineages & variants (Rachel Colquhoun)
How we detect and define new variants (Natalie Groves)
Year(s) Of Engagement Activity 2022
URL https://www.climb.ac.uk/3rd-articnetwork-and-climb-big-data-joint-workshop-information-for-participa...
 
Description 14-15JUL2021 ARTICnetwork & CLIMB-BIG-DATA workshop on Covid-19 data analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Together with the ARTIC network, we organised the 2nd two-day workshop about COVID-19 data analysis.

"Theory" (live sessions) and "practice" (homework) were separate, so everyone was able to choose what to take or to do from this course at their own pace.
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop-2/
 
Description 14-16JUN2021 - A primer on 16S Data Analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact OBJECTIVES:
The key steps of a metabarcoding analysis from raw reads to numerical ecology
The difference among OTUs (Operational Taxonomic Units) and ASVs (Amplicon Sequence Variants), and how to produce them from the raw reads
The tools available to analyse metabarcoding experiment, with a strong focus on Qiime2
How to use the Qiime2 package, understanding the file it produces and how to use its documentation to analyse 16S reads and produce interactive visualizations of taxonomy plots, alpha diversity and beta diversity (PCoA)
The basic concepts of numerical ecology, and how to transfer the analysis from the command line to R
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/a-primer-on-metabarcoding-analyses/
 
Description 15OCT2021 - AMR workshop - Bioinformatics tools and methods for AMR 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A joint initiative with the Public Health Alliance for Genomic Epidemiology (PHA4GE) and the Joint Programming Initiative on Antimicrobial Resistance (JPIAMR), delivered on 15th October 2021.

Antimicrobial resistance is a global public health threat and this workshop aimed to provide training on the following:

Use of existing AMR-related databases and resources (including CARD, NCBI, and PATRIC)
Theory and use of bioinformatics tools to detect AMR genes from genomes (e.g., AMRFinderPlus)
How to compare and systematically report results from AMR genomics using hAMRonization
A practical introduction to bioinformatics workflows for AMR genomics
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/amr-workshop/
 
Description 5-7JUL2021 - MASTERING KRAKEN2 FOR TAXONOMY PROFILING 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact QC, Host removal, Kraken2
Bracken, merging results, visualizations
Using R to analyse and visualize the output files
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/metagenomics-from-kraken-to-r/
 
Description ARTIC-CLIMB Workshop - Gambia 2025 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact The rapid evolution of pathogen genomic surveillance has become crucial in public health response and disease monitoring across Africa. This workshop, organised and supported by CLIMB-BIG-DATA and by the MRC Unit in the Gambia, built capacity in advanced sequencing techniques and bioinformatics analysis, focusing on implementing sustainable surveillance systems in African Institutions. The initiative brought together experts from leading institutions to share knowledge and establish collaborative networks for ongoing support and development.
Objectives
- Enhance technical capacity in pathogen genomic surveillance using cutting-edge sequencing technologies
- Strengthen bioinformatics capabilities for data analysis and interpretation
- Develop practical skills in primer design and sequencing protocols
- Build expertise in public health applications of genomic data
- Establish a network of trained professionals
Year(s) Of Engagement Activity 2025
 
Description ARTIC-CLIMB workshop, Gambia 2025 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The rapid evolution of pathogen genomic surveillance has become crucial in public health response and disease monitoring across Africa. This workshop, organised and supported by CLIMB-BIG-DATA and by the MRC Unit in the Gambia, aims to build capacity in advanced sequencing techniques and bioinformatics analysis, focusing on implementing sustainable surveillance systems in African Institutions. The initiative brings together experts from leading institutions to share knowledge and establish collaborative networks for ongoing support and development.
Objectives
Enhance technical capacity in pathogen genomic surveillance using cutting-edge sequencing technologies
Strengthen bioinformatics capabilities for data analysis and interpretation
Develop practical skills in primer design and sequencing protocols
Build expertise in public health applications of genomic data
Establish a network of trained professionals
Target Audience
Laboratory scientist, bioinformaticians, public health professionals, researchers involved in pathogen surveillance and environmental monitoring specialists with intermediate/advanced knowledge of Linux.
This workshop is intended for professionals who can already work on a Linux Environment.
Year(s) Of Engagement Activity 2025
 
Description ARTICnetwork and CLIMB-BIG-DATA joint workshop on COVID-19 data analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 130 participants attended this workshop about COVID-19 data analysis using the ARTIC pipeline.
-66% of participants reported the workshop and the use of the ARTIC pipeline will improve their research scale or scope
-100% of participants considered the workshop relevant or extremely relevant
-90% of participants will be sharing the training/info with colleagues
-100% of participants would be very (22%) or extremely (78%) interested in other workshops like this one
Year(s) Of Engagement Activity 2021
URL https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/
 
Description Anvi'o Workshop 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact 40 researchers at different career stage engaged with the authors of Anvi'o to learn how to curate and explore pangenomes, metagenomes and metabolic pathways datasets.
Year(s) Of Engagement Activity 2024
URL https://corebio.info/workshops-2024/anvio
 
Description Bioinformatics workshop CLIMB 2020 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Training on Linux for bioinformatics with MRC CLIMB BIG DATA.
Year(s) Of Engagement Activity 2020
URL https://github.com/telatin/learn_bash/wiki/CLIMB
 
Description CLIMB BIG DATA - The 8th Microbial Bioinformatics Hackathon Workshop at Bath 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The event brought together 25 key internationally renowned bioinformaticians and medical/molecular microbiologists, epidemiologists, and clinicians from 9 different countries to address targeted challenges in microbial bioinformatics and global public health; including:

Antimicrobial resistance in the food chain
Benchmarking datasets for bioinformatics tool validation
Scaling biological informatics methods
Year(s) Of Engagement Activity 2022
URL https://www.climb.ac.uk/the-8th-microbial-bioinformatics-hackathon-workshop-at-bath/
 
Description CLIMB-BIG-DATA 9th Microbes and Food Safety Bioinformatics Hackathon 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact CLIMB-Big-Data and the BBSRC/FSA Food Safety Network organised the 9th Microbes and Food Safety Bioinformatics Hackathon in Cambridge, UK, funded by The BBSRC international workshop fund.
Year(s) Of Engagement Activity 2023
URL https://www.climb.ac.uk/the-9th-microbes-and-food-safety-bioinformatics-hackathon/
 
Description CLIMB-BIG-DATA Lanzarote 2024 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Microbial Genomics in the 2030s
Saturday 26th Oct - Wednesday 30th Oct, Lanzarote
Session 1: Cloud computing/ data sharing for microbial bioinformatics in 2030s.
Session 2. Microbial sequencing technologies in the 2030s.
Session 3. Microbial bioinformatics: tools & resources in the 2030s.
Session 4. Making genomics accessible in the 2030s:
Session 5. Clinical & one-health microbial genomics in the 2030s.
Session 6. Microbial genomics and discovery research in the 2030s.

Participant list

• Abdelmajeed Nasereddin, Al Quds University Palestine
• Abdul Sesay, LHSTM, MRC Gambia
• Anais Painset, UKHSA
• Andrea Telatin, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Darius Armstrong-Jones, Imperial College
• David Aanensen, Centre for Genomic Pathogen Surveillance, Oxford
• Ed Feil, Bath
• Eshwar Mahenthiralingam, Cardiff
• Kat Holt, LSHTM
• Lisa Marchioretto, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Mark Pallen, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Muna Anjum, APHA
• Nick Loman, Birmingham and CLIMB-BIG-DATA
• Nicole Wheeler, Birmingham
• Sima Tokajian, Lebanese American University
• Sophie Nixon, Manchester
• Sophien Kamoun, the Sainsbury Lab and GetGenome charity
• Tom Connor, PHW and Cardiff
Year(s) Of Engagement Activity 2024
 
Description CLIMB-BIG-DATA Online Bioinformatics Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 20 participants from 5 UK institutions attended the workshop.
Learning objectives
-Understand that most bioinformatics files are simple text files with a specific format
-Understand how to interact with a Command Line Interface
-Understand the client/server model
-Understand filesystem paths
-Learn how to login to a remote Unix server (ssh)
-Learn how to navigate and manipulate the filesystem (pwd, cd, ls, mkdir)
-Learn how to extract information from text files (cat, head, tail, grep, cut, sort ...)
-Learn to redirect STDOUT/STDERR / how to create simple pipes
-Learn how to use the Miniconda package manager to install a program (spades)
-Learn how to run a bioinformatics program and inspect its output

Feedback:
-overall contribution of this workshop to their bioinformatics skill and knowledge was rated "very good
-46% of participants said the workshop and the use of the bioinformatics skills given by it would improve their research scale or scope significantly (38% moderately)
-100% of participants considered the workshop as relevant or extremely relevant
Year(s) Of Engagement Activity 2020
URL https://www.climb.ac.uk/climb-big-data-online-workshop/
 
Description CLIMB-BIG-DATA SysAdmin Curriculum 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Africa CDC organised this consultative Workshop to Develop a n d Finalize Curriculum t o Train System Administrators), 12-16 August 2024 - South Africa.
The aim was to support system administrators in Africa to meet industry demand and to support computer systems for data intensive bioinformatics.
Year(s) Of Engagement Activity 2024
 
Description CLIMB-BIG-DATA new documentation website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This new documentation provides comprehensive information on how to use the service, including step-by-step guides, tutorials, and best practices.
Year(s) Of Engagement Activity 2023
URL https://www.climb.ac.uk/mastering-climb-big-data-the-new-documentation-is-online/
 
Description CLIMB-BIG-DATA, Cloud Computing in Africa - Strategic Event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Cloud Computing in Africa - Strategic Event
26-27 February 2025
Venue: Sir Dawda Kairaba Jawara International Conference Centre, Fajara, the Gambia
Aim: Moving towards a community led, needs driven, pathogen genomics and bioinformatics cloud infrastructure in Africa

Session 1: Opening and Scene Setting - pathogen genomics in Africa + where are we globally

Session 2: Current Landscape and Challenges - Assessment and understanding: map the current landscape of pathogen genomics capabilities in Africa, identify key challenges and barriers in implementing pathogen surveillance, document successful local solutions and adaptations, understand the impact of infrastructure limitations on surveillance work

Session 3: Infrastructure Initiatives
Present existing infrastructure initiatives and platforms, discuss criteria for choosing between cloud, local HPC, or hybrid solutions, identify pathways to infrastructure sustainability, develop strategies for resource sharing between institutions

Session 4: Technical Solutions and Resources
Assess integration opportunities between existing infrastructure and initiatives (physical infrastructure, software platforms, data sharing), explore AI and machine learning applications, define technical solutions for varying expertise levels, create frameworks for efficient pathogen surveillance workflows

Session 5: Knowledge Transfer and Capacity Building
From one-off training to capacity building, support mechanisms, metrics for measuring capacity building success, structures for long-term knowledge retention and transfer

Session 6: Way Forward
Develop action plan for addressing identified challenges, create coordination mechanisms to avoid effort duplication
Year(s) Of Engagement Activity 2025
 
Description Cloud Computing in Africa - Feb 2025 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Cloud Computing in Africa - Strategic Event
26-27 February 2025
Venue: Sir Dawda Kairaba Jawara International Conference Centre, Fajara, the Gambia
Aim: Moving towards a community led, needs driven, pathogen genomics and bioinformatics cloud infrastructure in Africa

The CLIMB infrastructure has played a crucial role in pathogen surveillance and analyses in the last decade. As we look to strengthen genomic surveillance capabilities across Africa, there is a need for focused discussion on sustainable, practical solutions that build upon existing infrastructure while addressing region-specific challenges.
Our vision for CLIMB in Africa centres on four interconnected pillars that will strengthen pathogen genomics capabilities across the continent.
First, we aim for CLIMB to be a robust and sustainable infrastructure that serves diverse regional needs. This requires an understanding of existing computational capabilities and limitations, allowing us to identify where resources can be shared and how platforms can work together. There will be a focus on creating scalable solutions that balance cloud-based and local infrastructure requirements, ensuring sustainability while meeting immediate needs.
Second, we recognise that data standards and interoperability are crucial for effective collaboration. We aim to establish common frameworks for pathogen genomics data that enable seamless sharing while ensuring security and privacy. This includes developing standardised approaches to metadata collection and management, creating protocols for data exchange, and building systems that can integrate with existing global platforms.
Third, the success of this initiative depends on building lasting capacity within the region. This includes linking public health and research activity, as well as developing capacity across the spectrum of activities associated with genomics ; from systems administration through to wetlab scientists. Allied activities such as ARTIC and the ISO in a box initiative all have a part to play, and there is an opportunity to develop and expand communities of practice, technical support mechanisms and mentorship programmes that foster long-term professional growth.
Fourth, we must ensure long-term sustainability through strategic funding and operational models. This means identifying and pursuing funding opportunities and creating an operations frameworks to tap into these opportunities.
This meeting aims to move beyond discussion to create actionable plans and concrete solutions for strengthening pathogen genomics infrastructure across Africa. The focus will be on practical, sustainable solutions that can be implemented within existing frameworks while building capacity for future growth.

Participants:
Peter Van Heusden
Dominique Anderson
Kirsty Lee Garson
Benson Okello
Eddy Lusamaki
Alice Matimba
Andrew Rambaut
Nicholas Loman
Tom Connor
Samuel Wilkinson
Thomas Brier
George Githinji
Emma Hodcroft
Ebenezer Foster-Nyarko
Jolynne Mokaya
David Aanensen
Abdou Padane
Ya JankyJagne
Jainaba Njie Jobe
Binta Faye
Raissa Muriel De Souza
Oluwatobiloba Simeon Kazeem
Debe Siaka
Allen Kevin Olayemi Campbell
Kwasi Agyenkwa-Mawuli
Joseph Akoi Bore
Vidalyn Folorunso
Mamadou Ndao
Adjiratou Aissatou Ba
Marilyne Aza Gnandji azagnandji
Stephenie Key
Alhagie Baldeh
Mamadou Aliu Jalo
Year(s) Of Engagement Activity 2025
 
Description Darwin Day - Oral presentation for Italian students on the Gut Microbiome 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact Oral presentation (remote) for the National Darwin Day organized by UAAR for High School Students.

Live streamed to Facebook (https://www.facebook.com/95169492091/videos/3664473390326646) and YouTube (https://www.youtube.com/watch?v=sdQ11CKu7kI) with >1000 attendants and ~3000 view in the first week.
Year(s) Of Engagement Activity 2021
URL https://www.facebook.com/95169492091/videos/3664473390326646
 
Description MMB DTP / CLIMB Gambia - November 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact 10-17 November 2023, LSHTM@MRC Unit in the Gambia
Monday, November 13:
Talks on the MRC Unit from Martin Antonio, Abdul Sesay et al.
Talk on MRC CLIMB-BIG-DATA from Mark Pallen
Talk on MMB DTP from Mark Pallen
Talk on Get Genomes from James Canham
Tour of sequencing labs by Sessinou Benoit Assogba
UK-MRCG student meeting with breakout sessions and icebreaking activities run by Abdoulie Kanteh and Eunice Kiamba. Mark Webber to attend as an observer.
Discussions between Fran Terry, Andrea Telatin, Mark Pallen and MRC Unit IT team on infrastructure, capacity building and training

Tuesday, November 14:
Interactive demonstration of CLIMB BIG DATA Jupyter Notebooks for MRC Unit staff and students and MRC students from UEA
Demonstration: Introduction to Jupyter Notebook Servers led by Andrea Telatin and Ryan Cook, Quadram Institute
Case Study: Campylobacter and AMR: from Sequence to Consequence led by Ben Pascoe, Evangelos Mourkas (Oxford University)
Demonstration: Realising the Potential of iPython Notebooks in Microbial Bioinformatics, led by Fran Terry, University of Swansea

Wednesday, November 15:
Field trip to a field station to see the work of MRC field workers at Kenneba

Thursday, November 16
MRCG-MMB graduate student research symposium chaired by Mark Webber and Nuriden Mohammed
CLIMB BIG DATA Discussions between Fran Terry, Andrea Telatin, Mark Pallen and MRC Unit IT team on infrastructure, capacity building and training

Friday, November 17
MRC Unit Seminar by Mark Webber
Year(s) Of Engagement Activity 2023
 
Description Microbial Genomics in the 2030s 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Microbial Genomics in the 2030s
Lanzarote, Saturday 26th Oct - Wednesday 30th Oct 2024
Participant list
• Abdelmajeed Nasereddin, Al Quds University Palestine
• Abdul Sesay, LHSTM, MRC Gambia
• Anais Painset, UKHSA
• Andrea Telatin, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Darius Armstrong-Jones, Imperial College
• David Aanensen, Centre for Genomic Pathogen Surveillance, Oxford
• Ed Feil, Bath
• Eshwar Mahenthiralingam, Cardiff
• Kat Holt, LSHTM
• Lisa Marchioretto, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Mark Pallen, Quadram Institute, Norwich and CLIMB-BIG-DATA
• Muna Anjum, APHA
• Nick Loman, Birmingham and CLIMB-BIG-DATA
• Nicole Wheeler, Birmingham
• Sima Tokajian, Lebanese American University
• Sophie Nixon, Manchester
• Sophien Kamoun, the Sainsbury Lab and GetGenome charity
• Tom Connor, PHW and Cardiff
Year(s) Of Engagement Activity 2024
 
Description Phage Genome Annotation Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A two-day course that will cover the processes of phage genome assembly, annotation and introduction to comparative phage genome analysis. Solutions to common phage genome assembly problems will be presented, through worked examples.
DAY 1 - TOPICS: Quality control of read data. Subsampling of reads. Assembly of phage genome. Preliminary identification of closest relatives. Identification of genomic termini re-ordering of genome.
DAY 2 - TOPICS: Error correction of genome assembly. First pass automated annotation. Checking Annotations. Preparation of files for submission to ENA. Comparative genomics. Visualization of genomes
Year(s) Of Engagement Activity 2022
 
Description Phage Training - Kenya 2025 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact From , , Phage Hunters Training and Research Program (PHTRP) successfully conducted the , , , which was held in Kenya Institute of Primate Research (KIPRE) and The Technical University of Kenya (TUK) in Nairobi, Kenya.

This workshop was a joint initiative by 8 partners and collaborators including: Phage Hunters Training and Research Program (PHTRP), The Phage, KENYA EDUCATION NETWORK TRUST (KENET), The Technical University of Kenya (TUK), Kenya Institute of Primate Research (KIPRE), CLIMB-BIG-DATA, Phage Kenya Consortium, and Jenomu Bioinformatics.

The workshop specifically benefited participants, who included recent graduate students, early-career researchers, healthcare professionals, and professionals in microbiology, bioinformatics.
Year(s) Of Engagement Activity 2025
URL https://www.linkedin.com/pulse/phtrp-successfully-completes-phage-zbopf/?trackingId=zdfWjxcpSSyglvvo...
 
Description Phage genome assembly - VoM 2025 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 24th January 2025. 20 delegates attended a hands-on phage genome assembly workshop, led by Dr Andrew Millard of the University of Leicester Centre for Phage Research. This gave delegates a valuable opportunity to build their practical skillset handling the assembly and annotation of bacteriophage genomes directly from DNA sequencing. Those that attended the workshop found it extremely useful and they can transfer these skills to either their own research or to support and teach others. Delegates enjoyed the interactive format in a day where the computational resources and running was sponsored by the Cloud Infrastructure for Microbial Bioinformatics (CLIMB).
Year(s) Of Engagement Activity 2025
URL https://www.northumbria.ac.uk/research/1/our-peaks-of-excellence/microbiome-exploration/vom-uk-2024-...
 
Description nf-core hackathon 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Local site of "nf-core" hackathon to improve the open source pipelines and documentation from the nf-core organization
Year(s) Of Engagement Activity 2024
URL https://corebio.info/workshops-2024/nf-core