CLIMB-BIG-DATA: A Cloud Infrastructure for Big-Data Microbial Bioinformatics

Lead Research Organisation: Quadram Institute

Department Name: Microbes in the Food Chain

Abstract

High-throughput sequencing has transformed microbiology, delivering an explosion in genomic and metagenomic big data. However, many microbiologists remain unable to exploit large genomics datasets to address key questions in microbiology, because they lack access to the relevant computational resources, bioinformatics tools or expertise in data analysis. To address this problem, six years ago we launched CLIMB, a pioneering British cloud-computing infrastructure project funded by the MRC that has supported >900 users. As CLIMB comes to an end, we propose a unique new partnership--CLIMB-BIG-DATA (Cloud Infrastructure for Big-Data Microbial Bioinformatics)- to meet the bioinformatic needs of the UK microbiology community as we head into the 2020s. This new CLIMB-BIG-DATA partnership will occupy a distinctive position in the UK, underpinning research in the academic sector alongside the front-line work of government agencies and the health service, while also supporting research that maps on to a wide variety of national/UKRI and international strategic priorities and Official Development Assistance objectives.

In response to community needs (as evidenced by >160 signatories), the proposed partnership will maintain the existing CLIMB infrastructure to support hundreds of research projects including high-profile efforts to track the spread of Ebola or Zika virus. However, we also promise to deliver a step-change in the scale and scope of what we can offer to users. We will adopt a matrix model, in which a range of activities will be mapped on to strategically important themes championed by our investigators, including Antimicrobial Resistance; Emerging Infectious Disease and Global Health; Microbial Genomics for Public Health; Microbial communities and metagenomics; Pathogen Biology and Functional Genomics; Sequencing Technologies.

Activities aimed at community engagement will include bioinformatics workshops, hackathons and symposia. Activities focused on tools and integration will include enhanced support for sharing software and data, workflow integration and migration between clouds; enhanced support and security for clinical applications; plus integration with large datasets at external facilities, such as the European Nucleotide Archive. Activities focused on infrastructure include: provision of graphics processing units and enhanced storage; maintenance of our original cloud-computing infrastructure to support microbial bioinformatics; plus incorporation of cloud infrastructures from the MRC unit in the Gambia and from the Quadram Institute. The CLIMB-BIG-DATA partnership will run as a UKRI-supported project for five years, with the expectation that the project will become self-financing through robust pathways to sustainability and expansion. The partnership will draw upon a diverse team of partners from multiple research organisations and collaborators from government agencies, and it will be run from the Quadram Institute in Norwich, which as a strategically funded UKRI research institute will provide a first-class stable and resilient environment for the project's future.

Technical Summary

The CLIMB-BIG-DATA partnership will provide a substantial computational resource that will enhance UK capability and infrastructure in microbial bioinformatics, building on our highly successful CLIMB project. Our computational infrastructure will feature an OpenStack cloud architecture with >10000 virtual CPU cores spanning six research organisations (incorporating clouds from the MRC unit Gambia and the Quadram), with access to the CEPH platform to implement object storage. A dedicated web portal Bryn will allow users to gain easy access to their own virtual machines, preconfigured with powerful user-friendly bioinformatics tools. We will add newly requisitioned specialised servers aimed at memory-intensive tasks (e.g. metagenomic assembly) or compute-intensive tasks (e.g. GPU nanopore analyses) and we will add substantial additional storage (>3 petabytes). Other features will include a freely accessible database of relevant workflows, pipelines, scripts, programs, preconfigured virtual machine images and containers, curated to support strategically relevant themed activities; an accreditation-compliant computational infrastructure for linking sensitive human and animal health metadata with microbial sequence data; support for containerisation via the Docker Engine and Singularity; a capability to share VMs, containers, data and software across the entire CLIMB-BIG-DATA infrastructure and with public cloud providers (with cloud bursting on to public clouds, should demand spike on our own infrastructure). We also promise an ambitious and exciting programme of training/community engagement, featuring hackathons, workshops, and modules suitable for a wide range of users from undergraduate students to professional bioinformaticians in the UK and more widely. We will build protocols for demand management and for charging users as we move towards becoming self-sustainable and will also improve integration with public facilities and new potential partner sites.

Planned Impact

This research will be of benefit to a range of beneficiaries outside of academic disciplines that take in microbiology:

1. Clinical and veterinary microbiologists, vets and governmental organisations such as APHA, FSA and local agencies including health services such as PHE/PHW who have a role in tracking zoonotic disease and tracing pathogens through the food chain. These users will be able to use our computational infrastructure to integrate informatics systems, animal and human health metadata, epidemiological disease patterns and microbial (meta)genomic data to elucidate modes and routes of transmission, detect outbreaks, explore the relationships between potential pathogens and disease, with impacts on animal health, welfare, and disease prevention. This system will also provide an infrastructure that will bring new opportunities for productive engagement between organisations focused on animal health and the academic sector, so that research findings and approaches can be more easily translated into outcomes that impact food security and human and animal health.

2. Industrial users stand to benefit in several ways. The tools around the characterisation and development of novel antimicrobials and metagenomics are of wide interest to industrial beneficiaries as these tools will be invaluable for the identification of new targets and the rational design of probiotic treatments for the prevention of microbial disease in farmed animals. Industrial users will also benefit from the tools and data that the infrastructure will make available. These will allow the rapid contextualisation and characterisation of bacteria of industrial importance (for example in product spoilage), information that can then be used to design interventions or better optimise preservative selection.

3. Commercial beneficiaries include sequencing companies, computer companies and private laboratories, who stand to benefit from increased demand for their products and opportunities for innovation and spread of best practice (NB: both Solexa and Oxford Nanopore Sequencing were developed within the UK, with benefits to our economy).

4. Anyone planning a large cloud-based computing project will be able to draw on the example and precedent we set here.

5. Policy makers, who will benefit from grounding their public policy and legislation, e.g. on food safety or pandemic preparedness, on a more solid understanding of bacterial evolution, epidemiology, population genetics and taxonomy. .

6. The wider public will benefit from the positive impacts on food security, reduced preservative use, and increased profitability of UK companies, resulting in stronger tax revenues for the UK.

This work will also make a decisive contribution through employment and training to enhancing the professional and research skills base of the United Kingdom, contributing to the development of the knowledge economy through the training of undergraduates and postgraduates in data intensive research techniques, using the common CLIMB-BIG-DATA platform.

Funded Value:

£1,994,477

Funded Period:

Apr 20 - Mar 25

Funder:

MRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

MR/T030062/1

Principal Investigator:

Mark Pallen

Health Category:

Unclassified

Organisations

People	ORCID iD
Mark Pallen (Principal Investigator)	http://orcid.org/0000-0003-1807-3657
Andrew Millard (Co-Investigator)	http://orcid.org/0000-0002-3895-2854
Simon Thompson (Co-Investigator)
Andrew Page (Co-Investigator)
Nicholas Loman (Co-Investigator)
Martin Antonio (Co-Investigator)
Samuel Sheppard (Co-Investigator)
Christopher Quince (Co-Investigator)
Nabil-Fareed Alikhan (Co-Investigator)	http://orcid.org/0000-0002-1243-0767
Thomas Connor (Co-Investigator)
Anna Price (Researcher)

Publications

Author Name Title

Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Gilroy R (2020) A Genomic Census of the Chicken Gut Microbiome using Metagenomics and Culture

Harling-Lee J (2022) A graph-based approach for the visualisation and analysis of bacterial pangenomes

Elek C (2023) A hybrid and poly-polish workflow for the complete and accurate assembly of phage genomes: a case study of ten przondoviruses in Microbial Genomics

Adamson J (2022) A large outbreak of COVID-19 in a UK prison, October 2020 to April 2021

Adamson JP (2022) A large outbreak of COVID-19 in a UK prison, October 2020 to April 2021. in Epidemiology and infection

Shepherd MJ (2022) A near-deterministic mutational hotspot in Pseudomonas fluorescens is constructed by multiple interacting genomic features. in Molecular biology and evolution

Michniewski S (2021) A new family of "megaphages" abundant in the marine environment. in ISME communications

Haines M (2020) Analysis of selection methods to develop novel phage therapy cocktails against antimicrobial resistant clinical isolates of bacteria

Brown HL (2020) Antibacterial and Antivirulence Activity of Manuka Honey against Genetically Diverse Staphylococcus pseudintermedius Strains. in Applied and environmental microbiology

Baker M (2022) Antimicrobial resistance in dairy slurry tanks: a critical point for measurement and control

Baker M (2022) Antimicrobial resistance in dairy slurry tanks: A critical point for measurement and control. in Environment international

Dabrera G (2022) Assessment of mortality and hospital admissions associated with confirmed infection with SARS-CoV-2 Alpha variant: a matched cohort and time-to-event analysis, England, October to December 2020. in Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin

Glendinning L (2021) Author Correction: Assembly of hundreds of novel bacterial genomes from the chicken caecum in Genome Biology

Pallen M (2021) Bacterial Nomenclature in the Era of Genomics

Birolo G (2022) BamToCov: an efficient toolkit for sequence coverage calculations. in Bioinformatics (Oxford, England)

Mishra S (2021) Changing composition of SARS-CoV-2 lineages and rise of Delta variant in England. in EClinicalMedicine

Turner AK (2021) Chemical biology-whole genome engineering datasets predict new antibacterial combinations. in Microbial genomics

Brunner F (2022) City-wide wastewater genomic surveillance through the successive emergence of SARS-CoV-2 Alpha and Delta variants

McCrone JT (2021) Context-specific emergence and growth of the SARS-CoV-2 Delta variant. in Research square

Muscatt G (2022) Crop management shapes the diversity and activity of DNA and RNA viruses in the rhizosphere. in Microbiome

Raphenya AR (2022) Datasets for benchmarking antimicrobial resistance genes in bacterial metagenomic and whole genome sequencing. in Scientific data

O'Connell L (2023) Detailed analysis of in-hospital transmission of SARS-CoV-2 using whole genome sequencing in Journal of Hospital Infection

Holden ER (2020) Donor plasmids for phenotypically neutral chromosomal gene insertions in Enterobacteriaceae. in Microbiology (Reading, England)

Eales O (2022) Dynamics of competing SARS-CoV-2 variants during the Omicron epidemic in England. in Nature communications

Zamudio R (2022) Dynamics of extended-spectrum cephalosporin resistance genes in Escherichia coli from Europe and North America. in Nature communications

Research Databases and Models
Engagement Activities


Title	DNA and RNA viruses in the rhizosphere
Description	This repository contains data used in Muscatt et al. 2022 Futher details on analysis can be found here https://github.com/GeorgeMuscatt/RhizosphereVirome Data is stored in the file RhizosphereVirome.tar The following files are stored. See the README for full details: c1.ntw.gz = vConTACT2 network output file core_protein_concatenation_tree = ssRNA phage phylogenetic tree based on aligned core protein concatenations CP.faa.gz = fasta amino acid file containing coat protein sequences for 11,222 near-complete ssRNA phage vOTUs CP_ref_Leviviricetes.faa.gz = fasta amino acid file containing coat protein sequences for 1,868 reference Leviviricetes genomes dsDNA_gene_annotations.csv.gz = annotations for 20,746 dsDNA vOTU genes dsDNA_vOTUs.faa.gz = fasta amino acid file containing 20,267 dsDNA vOTU genes dsDNA_vOTUs.fna.gz = fasta nucleotide file containing 1,059 dsDNA vOTUs edges.csv.gz = edges for drawing vConTACT2 network gene_2_genome.csv.gz = input file for vConTACT2 containing gene-to-genome index for 1,059 dsDNA vOTUs and 16,540 ssRNA phage vOTUs MP.faa.gz = fasta amino acid file containing maturation protein sequences for 11,222 near-complete ssRNA phage vOTUs MP_ref_Leviviricetes.faa.gz = fasta amino acid file containing maturation protein sequences for 1,868 reference Leviviricetes genomes nodes.csv.gz = nodes for drawing vConTACT2 network RdRp.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 11,222 near-complete ssRNA phage vOTUs RdRp_ref_Leviviricetes.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 1,868 reference Leviviricetes genomes ssRNA_vOTUs.faa.gz = fasta amino acid file containing 52,700 ssRNA phage vOTU genes ssRNA_vOTUs.fna.gz = fasta nucleotide file containing 16,541 ssRNA phage vOTUs viral_cluster_overview.csv = output file from vConTACT2 containing viral cluster information
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://leicester.figshare.com/articles/dataset/DNA_and_RNA_viruses_in_the_rhizosphere/19635336


Title	DNA and RNA viruses in the rhizosphere
Description	This repository contains data used in Muscatt et al. 2022 Futher details on analysis can be found here https://github.com/GeorgeMuscatt/RhizosphereVirome Data is stored in the file RhizosphereVirome.tar The following files are stored. See the README for full details: c1.ntw.gz = vConTACT2 network output file core_protein_concatenation_tree = ssRNA phage phylogenetic tree based on aligned core protein concatenations CP.faa.gz = fasta amino acid file containing coat protein sequences for 11,222 near-complete ssRNA phage vOTUs CP_ref_Leviviricetes.faa.gz = fasta amino acid file containing coat protein sequences for 1,868 reference Leviviricetes genomes dsDNA_gene_annotations.csv.gz = annotations for 20,746 dsDNA vOTU genes dsDNA_vOTUs.faa.gz = fasta amino acid file containing 20,267 dsDNA vOTU genes dsDNA_vOTUs.fna.gz = fasta nucleotide file containing 1,059 dsDNA vOTUs edges.csv.gz = edges for drawing vConTACT2 network gene_2_genome.csv.gz = input file for vConTACT2 containing gene-to-genome index for 1,059 dsDNA vOTUs and 16,540 ssRNA phage vOTUs MP.faa.gz = fasta amino acid file containing maturation protein sequences for 11,222 near-complete ssRNA phage vOTUs MP_ref_Leviviricetes.faa.gz = fasta amino acid file containing maturation protein sequences for 1,868 reference Leviviricetes genomes nodes.csv.gz = nodes for drawing vConTACT2 network RdRp.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 11,222 near-complete ssRNA phage vOTUs RdRp_ref_Leviviricetes.faa.gz = fasta amino acid file containing RNA-dependent RNA polymerase sequences for 1,868 reference Leviviricetes genomes ssRNA_vOTUs.faa.gz = fasta amino acid file containing 52,700 ssRNA phage vOTU genes ssRNA_vOTUs.fna.gz = fasta nucleotide file containing 16,541 ssRNA phage vOTUs viral_cluster_overview.csv = output file from vConTACT2 containing viral cluster information
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://leicester.figshare.com/articles/dataset/DNA_and_RNA_viruses_in_the_rhizosphere/19635336/1


Title	INPHARED_DATABASE
Description	inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible.Useful information, including viral taxonomy and bacterial host data, is extracted from the Genbank files and provided in a summary table. Genes are called on the genomes using Prokka and this output is used to gather metrics which are summarised in the output files, as well as useful input files for vConTACT2. The data provided is all genomes up to Jan 2021. This can be downloaded so users do not have to repeat the process of consistent gene calling on existing genomes. The folder GenomesDB contains subfolders each containing a subfolder that is named on the accession number of each phage. Within each folder are re-called genes in the following format .ffn.faa The complete genome fna and genbank file without any annotation gbf See https://github.com/RyanCook94/
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://leicester.figshare.com/articles/dataset/INPHARED_DATABASE/14242085


Title	INPHARED_DATABASE
Description	inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible.Useful information, including viral taxonomy and bacterial host data, is extracted from the Genbank files and provided in a summary table. Genes are called on the genomes using Prokka and this output is used to gather metrics which are summarised in the output files, as well as useful input files for vConTACT2. The data provided is all genomes up to Jan 2021. This can be downloaded so users do not have to repeat the process of consistent gene calling on existing genomes. The folder GenomesDB contains subfolders each containing a subfolder that is named on the accession number of each phage. Within each folder are re-called genes in the following format .ffn.faa The complete genome fna and genbank file without any annotation gbf See https://github.com/RyanCook94/
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://leicester.figshare.com/articles/dataset/INPHARED_DATABASE/14242085/1


Title	Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance
Description	Supplementary Material for 'Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance', as published in Microbial Genomics.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://microbiology.figshare.com/articles/dataset/Local_accessory_gene_sharing_among_Egyptian_Campy...


Title	Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance
Description	Supplementary Material for 'Local accessory gene sharing among Egyptian Campylobacter potentially promotes the spread of antimicrobial resistance', as published in Microbial Genomics.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://microbiology.figshare.com/articles/dataset/Local_accessory_gene_sharing_among_Egyptian_Campy...


Title	MetaPhage Example Report
Description	Example report generated by MetaPhage (https://github.com/MattiaPandolfoVR/MetaPhage) as described in the paper. MetaPhage is a reads to report pipeline embedding viral miners and custom tools to generate automatic diversity plots. Pipeline documentation: https://mattiapandolfovr.github.io/MetaPhage/
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://figshare.com/articles/dataset/MetaPhage_Example_Report/20424705/1


Title	MetaPhage Example Report
Description	Example report generated by MetaPhage (https://github.com/MattiaPandolfoVR/MetaPhage) as described in the paper. MetaPhage is a reads to report pipeline embedding viral miners and custom tools to generate automatic diversity plots. Pipeline documentation: https://mattiapandolfovr.github.io/MetaPhage/
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://figshare.com/articles/dataset/MetaPhage_Example_Report/20424705


Description	02FEB2022 - CLIMB-BD workshop: Bioinformatics Skills for Microbial Genomics
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	Bioinformatics rely on a vast number of tools (packages, electronic notebooks, programming languages and their libraries) that bioinformaticians need to be able to install, manage and run. A growing challenge is represented by the organisation of data inputs and outputs - particularly as genomic datasets continue to expand. This one-day training workshop introduced key concepts and working modalities that address these challenges, which are rapidly being adopted in the industry, including: -Using containers (such as Docker and Singularity) - currently the easiest method for managing and deploying software, easier sharing of code, and higher reproducibility of the pipelines. -Workflow languages (Nextflow DSL2) - workflow managers provide a framework for running analyses. They intrinsically provide a degree of data provenance and are easy to re-run analyses with different datasets or parameters in a range of computing environments. -GNU/Linux command-line
Year(s) Of Engagement Activity	2022
URL	https://www.climb.ac.uk/bioinformatics-skills-microbial-genomics/


Description	11-13 OCT 2021 - AMR HACKATHON (Bioinformatics tools and methods for AMR in bacteria)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Together with the Public Health Alliance for Genomic Epidemiology (PHA4GE) and the Joint Programming Initiative on Antimicrobial Resistance (JPIAMR), we organised the 7th Microbial Bioinformatics Hackathon with a special focus on Antimicrobial Resistance. Antimicrobial resistance is a critical universal issue and scientists need reliable, fast, reproducible tools for their research. The aim of this hackathon was to improve upon/build/extend bioinformatics tools and methods for the AMR community. The hackathon has a special focus on antimicrobial resistance in bacteria. We brought together international bioinformatics researchers, scientists and clinicians to collaborate and solve common problems that impact our community, as pathogens know no borders.
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/amr-hackathon/


Description	14-15JAN2022 - ARTICnetwork and CLIMB-BIG-DATA workshop on Covid-19 data analysis
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The third joint ARTICnetwork and CLIMB-BIG-DATA workshop on COVID-19 data analysis was held on 13-14 January 2022. We focused on a mixture of talks, panel discussions, practical hands-on, and questions & answers sessions. Introduction to the ARTIC&CLIMB-BIG-DATA workshop and the ARTIC project (Nick Loman) Case studies in molecular epidemiology of SARS-CoV-2 (Andrew Page) How to sequence COVID-19 using the ARTIC protocol (Josh Quick) Automating the ARTIC protocol using the OpenTrons OT2 (Jeremy Mirza) ARTIC Nanopore+Illumina Bioinformatics Pipeline (Sam Wilkinson) Bioinformatics "gotchas" (Nabil-Fareed Alikhan) DNA Spike-Ins for SARS-CoV-2 sequencing (Katherine Siddle) Wastewater sequencing bioinformatics (Chris Quince) A practical introduction to phylogenetics in the pandemic era (JT McCrone) Practical phylogenetics: lineages & variants (Rachel Colquhoun) How we detect and define new variants (Natalie Groves)
Year(s) Of Engagement Activity	2022
URL	https://www.climb.ac.uk/3rd-articnetwork-and-climb-big-data-joint-workshop-information-for-participa...


Description	14-15JUL2021 ARTICnetwork & CLIMB-BIG-DATA workshop on Covid-19 data analysis
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Together with the ARTIC network, we organised the 2nd two-day workshop about COVID-19 data analysis. "Theory" (live sessions) and "practice" (homework) were separate, so everyone was able to choose what to take or to do from this course at their own pace.
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop-2/


Description	14-16JUN2021 - A primer on 16S Data Analysis
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	OBJECTIVES: The key steps of a metabarcoding analysis from raw reads to numerical ecology The difference among OTUs (Operational Taxonomic Units) and ASVs (Amplicon Sequence Variants), and how to produce them from the raw reads The tools available to analyse metabarcoding experiment, with a strong focus on Qiime2 How to use the Qiime2 package, understanding the file it produces and how to use its documentation to analyse 16S reads and produce interactive visualizations of taxonomy plots, alpha diversity and beta diversity (PCoA) The basic concepts of numerical ecology, and how to transfer the analysis from the command line to R
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/a-primer-on-metabarcoding-analyses/


Description	15OCT2021 - AMR workshop - Bioinformatics tools and methods for AMR
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	A joint initiative with the Public Health Alliance for Genomic Epidemiology (PHA4GE) and the Joint Programming Initiative on Antimicrobial Resistance (JPIAMR), delivered on 15th October 2021. Antimicrobial resistance is a global public health threat and this workshop aimed to provide training on the following: Use of existing AMR-related databases and resources (including CARD, NCBI, and PATRIC) Theory and use of bioinformatics tools to detect AMR genes from genomes (e.g., AMRFinderPlus) How to compare and systematically report results from AMR genomics using hAMRonization A practical introduction to bioinformatics workflows for AMR genomics
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/amr-workshop/


Description	5-7JUL2021 - MASTERING KRAKEN2 FOR TAXONOMY PROFILING
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	QC, Host removal, Kraken2 Bracken, merging results, visualizations Using R to analyse and visualize the output files
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/metagenomics-from-kraken-to-r/


Description	ARTICnetwork and CLIMB-BIG-DATA joint workshop on COVID-19 data analysis
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	130 participants attended this workshop about COVID-19 data analysis using the ARTIC pipeline. -66% of participants reported the workshop and the use of the ARTIC pipeline will improve their research scale or scope -100% of participants considered the workshop relevant or extremely relevant -90% of participants will be sharing the training/info with colleagues -100% of participants would be very (22%) or extremely (78%) interested in other workshops like this one
Year(s) Of Engagement Activity	2021
URL	https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/


Description	CLIMB-BIG-DATA Online Bioinformatics Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	20 participants from 5 UK institutions attended the workshop. Learning objectives -Understand that most bioinformatics files are simple text files with a specific format -Understand how to interact with a Command Line Interface -Understand the client/server model -Understand filesystem paths -Learn how to login to a remote Unix server (ssh) -Learn how to navigate and manipulate the filesystem (pwd, cd, ls, mkdir) -Learn how to extract information from text files (cat, head, tail, grep, cut, sort ...) -Learn to redirect STDOUT/STDERR / how to create simple pipes -Learn how to use the Miniconda package manager to install a program (spades) -Learn how to run a bioinformatics program and inspect its output Feedback: -overall contribution of this workshop to their bioinformatics skill and knowledge was rated "very good -46% of participants said the workshop and the use of the bioinformatics skills given by it would improve their research scale or scope significantly (38% moderately) -100% of participants considered the workshop as relevant or extremely relevant
Year(s) Of Engagement Activity	2020
URL	https://www.climb.ac.uk/climb-big-data-online-workshop/


Description	Phage Genome Annotation Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	A two-day course that will cover the processes of phage genome assembly, annotation and introduction to comparative phage genome analysis. Solutions to common phage genome assembly problems will be presented, through worked examples. DAY 1 - TOPICS: Quality control of read data. Subsampling of reads. Assembly of phage genome. Preliminary identification of closest relatives. Identification of genomic termini re-ordering of genome. DAY 2 - TOPICS: Error correction of genome assembly. First pass automated annotation. Checking Annotations. Preparation of files for submission to ENA. Comparative genomics. Visualization of genomes
Year(s) Of Engagement Activity	2022