Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing

Lead Research Organisation: University of York
Department Name: Biology

Abstract

Environmental Biotechnology (EB) addresses global challenges using engineered microbial systems for environmental protection, bio-remediation and resource recovery. It is a critical and expanding area for the UK and underpins some of the world's most important industries. This is acknowledged by the funding invested in the creation of BBSRC's Networks in Industrial Biotechnology and Bioenergy (NIBBs).
A deep mechanistic understanding of the complex microbial communities involved in the biological cycling of global resources is essential to meet global challenges such as Net Zero, waste management and increased demand. The complexity of these microbiomes can be orders of magnitude larger than those found in the human gut, requiring different approaches to experimental design and analysis with High Performance Computing (HPC). However, EB is an interdisciplinary field that attracts researchers from a broad range of disciplines including Mathematics, Engineering, Biology, Social Sciences, Management, Physics and Chemistry and big data 'omics analyses on HPC systems are often not core skills for such researchers.
This proposal aims to develop and deliver highly accessible resources that will upskill these interdisciplinary researchers so that they are able to generate and analyze big data relating to EB using Cloud HPC. Although infrastructure and resources exist for microbial 'omics (e.g. JGI's IMG/M, MG-RAST, Galaxy, CLIMB, EBI) there is a lack of systematic training tightly linked to the EB domain and documentation is often focussed on technical proficiency rather than contextualised with a strong understanding of experimental design. We will provide foundational training and develop and deliver new advanced modules covering the specialised skills required to generate and analyse 'omics data using Cloud HPC resources. These will include experimental design and statistical modules to ensure researchers can generate data appropriate to investigate their research question. Modules will deploy cloud-based containerised instances provided by Google Education and Amazon Web Services (AWS) for exemplar workflows free to the learner. They will form a complete training resource with fully articulated prerequisites and learning objectives that can be used for in-person or online tutor-led workshops or self-paced learning. Our proposal offers structured Learning Paths from the statistical skills required for robust experimental design through to the reproducible execution and interpretation of 'omics analyses with HPC to cater to researchers with differing levels of previous experience and which allow self-assessment of training needs. We also provide Diversity Scholarships to enable members of underrepresented groups to participate in online or in-person training.
The collaboration between the University of York and the Software Sustainability Institute (SSI) brings together excellence in data science pedagogy and environmental 'omics research with the SSI's UK-leading expertise in research computing and community building. This will ensure the training developed genuinely complements, and aligns with, existing materials to enhance national provision.
Sustainability will be fostered by making the resources Findable, Accessible, Interoperable and Reusable (FAIR), providing cross-platform images for deployment and by developing and proactively engaging with a Community of Practice and providing Code Retreats for the supported practice of methods to participants' own data. In addition, "Cloud Administration Guides" will be developed for institutional HPC Teams to run specialised modules with their own resources. These Guides will be supported 1-to-2 day training by Cloud-SPAN systems administrators.
The project will be promoted by our partners, the SSI, Google Education, AWSand the N8 Centre of Excellence in Computationally Intensive Research and through delivery of conference talks and seminars.

Technical Summary

Environmental Biotechnology (EB) is an interdisciplinary field involving advanced molecular and applied microbiologists, environmental chemists and engineers that addresses global challenges using engineered microbial systems. This proposal aims to trainer these researchers to generate, analyse and mine big data relating to EB microbiomes which are larger than those found in the human gut and require different approaches to both measurement and analysis in order to manage reagent costs and effectively leverage available HPC resources.
Easily accessible and scalable HPC-based training is required to provide researchers with the skills and self-confidence to manipulate and analyse big data generated from 'omics technologies and generate biological insights from these highly interconnected systems. This area of bioinformatics involves a steep learning curve which can be confounded by the need to install packages with multiple dependencies onto different HPC architectures based on what is available at a researcher's home institution, even before the user can engage with writing scripts to manage workflows, manipulate or visualise data, or manage job schedulers. We will deploy Cloud-based containerised instances which are (1) accessible to anyone anywhere as long as they have an adequate internet connection, (2) have a very low hardware entry requirement and (3) allow for easily scalable and replicable installations of software that will not become deprecated as quickly as might occur on a local server. The cloud providers we are working with run grant schemes that provide significant resources to researchers that will support deployment of production instances of the images we will generate. We expect that our resources will be easily deployed and used by groups who do not necessarily have devoted bioinformaticians or expertise in HPC, providing a cost-effective route to useful analyses for researchers in a strategically key area of expansion in the biosciences.

Publications

10 25 50
 
Description Getting started with High Performance Computing: FAIR training for environmental scientists
Amount £29,513 (GBP)
Funding ID NE/X006999/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 09/2022 
End 06/2023
 
Description Metagenomics with High Performance Computing for environmental science Doctoral Training
Amount £54,449 (GBP)
Funding ID NE/Y003527/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 07/2023 
End 05/2024
 
Description Collaboration with Software Sustainability Institute 
Organisation Software Sustainability Institute
Country United Kingdom 
Sector Public 
PI Contribution The project team participates in quarterly management meeting with Neil Chue Hong, Director of the Software Sustainability Institute.
Collaborator Contribution Neil Chue Hong, Director of the Software Sustainability Institute participates in a quarterly management meeting to provide guidance and strategic advice on different aspects of the project.
Impact During our meetings we are able to discuss best practice in regards to; creating training materials, managing educational activities, engagement of the general public.
Start Year 2021
 
Description Collaboration with the White Rose DTP 
Organisation White Rose University Consortium
Country United Kingdom 
Sector Academic/University 
PI Contribution 2023/12 Emma Rand has designed an advanced course in Metagenomics as the White Rose DTP contribution to the Inter-DTP Skills programme for four BBSRC DTPs (White Rose, Oxford Interdisciplinary Bioscience DTP, Midlands Integrative Biosciences Training Partnership, South West Biosciences Doctoral Training Partnership. This will be delivered in May 2024.
Collaborator Contribution 1. This collaboration has promoted Cloud-SPAN training opportunities to new training cohorts involved in four BBSRC funded DTPs (White Rose, Oxford Interdisciplinary Bioscience DTP, Midlands Integrative Biosciences Training Partnership, South West Biosciences Doctoral Training Partnership). 2. This has allowed widened exposure to the Cloud-SPAN training programmes and resources.
Impact 1. This collaboration has allowed widened exposure nationally to the Cloud-SPAN training programmes and resources
Start Year 2023
 
Description EBnet Collaboration 
Organisation UK Environmental Biotechnology Network
Country United Kingdom 
Sector Academic/University 
PI Contribution James Chong and Sarah Forrester are active members within the EBnet Working Group. Through their work with EBnet they are able to publicise the Cloud-SPAN project, via sharing information and delivering talks at webinars.
Collaborator Contribution EBnet support and promote the training opportunities created through the Cloud-SPAN project. The collaboration also allows members to exchange expertise in the field of HPC driven microbial genomics research, which in turn improves the quality of the Cloud-SPAN training resources.
Impact James Chong and Sarah Forrester are active members within the EBnet Working Group. Through their work with EBnet they are able to publicise the Cloud-SPAN project, via sharing information and delivering talks at webinars.
Start Year 2021
 
Title Web-based App - Self-Assessment Quiz 
Description Using the Shiny an R package, an online interactive web-based app was created in order to evaluate the level of competence of a participant. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact This online self-assessment tool has been invaluable to determine the level of competence of participants; based on the results course participants can continue their registration to either an introductory course or an advanced course. 
URL https://shiny.york.ac.uk/er13/prenomics-quiz/#section-why
 
Description Blog re Genomics Course November 2021 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A blog was written evaluating the first Genomics Course which was delivered in November 2021. The blog was posted on the Cloud-SPAN forum and included on the SSI's website https://software.ac.uk/news/review-cloud-spans-genomics-course
Year(s) Of Engagement Activity 2021
URL https://cloudspan.peerboard.com/post/1021906833
 
Description Blog series to publicise the project and activities 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A series of 22 blogs were created in order achieve the following:
- Promote the Cloud-SPAN project to increase awareness of the projects' objectives and achieve recognition within the biology/HPC community and with other universities and similar training providers
- Promote individual training activities and attract registrations for courses
- Attract new followers on our social media account

These objectives were achieved as following the promotion of the blogs, we received enquiries regarding the project, new registrations for training activities, new followers on Twitter and LinkedIN and we also have built up relationships with other organisations and universities who support the promotion of our activities.
Year(s) Of Engagement Activity 2021,2022,2023
URL https://cloudspan.peerboard.com/
 
Description Code Retreat - April 2022, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces.

Learning outcomes from the event:
Working with your peers and with help from our instructors, you could:
- Revise our Prenomics or Genomics courses
- Get help organising and documenting your own analysis
- Apply tools taught in Genomics to your own data
- Get help with Creating your own Amazon Web Services instance for Genomics
- Network with other genomics researchers
Year(s) Of Engagement Activity 2022
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Code Retreat - December 2023 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow.

This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could:
- Revise our Prenomics or Genomics courses
- Get help organising and documenting your own analysis
- Apply tools taught in Genomics to your own data
- Get help with Creating your own Amazon Web Services instance for Genomics
- Network with other genomics researchers.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.york.ac.uk/upcoming/Code%20Retreat/
 
Description Code Retreat - January 2023, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could: - Revise our Prenomics or Genomics courses - Get help organising and documenting your own analysis - Apply tools taught in Genomics to your own data - Get help with Creating your own Amazon Web Services instance for Genomics - Network with other genomics researchers
Year(s) Of Engagement Activity 2023
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Code Retreat - January 2024, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could: - Revise our Prenomics or Genomics courses - Get help organising and documenting your own analysis - Apply tools taught in Genomics to your own data - Get help with Creating your own Amazon Web Services instance for Genomics - Network with other genomics researchers
Year(s) Of Engagement Activity 2024
URL https://cloud-span.york.ac.uk/upcoming/Code%20Retreat/
 
Description Code Retreat - May 2023 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces.
Learning outcomes from the event:
Working with your peers and with help from our instructors, you could:
- Revise the course materials
- Get help organising and documenting their own analyses
- Apply tools taught in Meta/Genomics to their own data
- Get help with Creating Amazon Web Services instance for Genomics
- Network with other researchers.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.york.ac.uk/upcoming/Core%20R/
 
Description Code Retreat - November 2022, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could: - Revise our Prenomics or Genomics courses - Get help organising and documenting your own analysis - Apply tools taught in Genomics to your own data - Get help with Creating your own Amazon Web Services instance for Genomics - Network with other genomics researchers
Year(s) Of Engagement Activity 2022
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Core R Workshop - June 2023 - Online 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact This online two-hour workshop is an introduction to R for complete beginners.

It teaches learners how to find their way round RStudio, use the basic data types and structures in R and how to organise their work with scripts and projects.
It also teaches learners how to import data, summarise it and create and format a graph. The workshop assumes no prior experience of coding.

Impact: increase knowledge, practical skills and confidence to apply the new skills in own projects.
Year(s) Of Engagement Activity 2023
URL https://github.com/Cloud-SPAN/core-r
 
Description Creation of the Cloud-SPAN Handbook 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In our Cloud-SPAN community we encourage everyone to come together to find solutions to problems and exchange experiences and knowledge. Our aim is to build a friendly and involved community of people who have used our resources, are interested in our resources, or who have expertise in the areas we cover. Ways to contribute include attending one of our courses, asking/answering questions on our community forum and making suggestions for improvements to our courses.

Handbook
This handbook is intended as a reference for both the core Cloud-SPAN team and for our wider community of learners. It's where you'll find our Code of Conduct, contributing guidelines and other practical information which will help you make the most of our resources in a friendly, understanding environment.
Year(s) Of Engagement Activity 2021
URL https://cloud-span.github.io/CloudSPAN-handbook/index.html
 
Description Creation of the Cloud-SPAN LinkedIn Account 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A Cloud-SPAN account was created to help provide an online presence for the Cloud-SPAN project. Via LinkedIn the different training activities are promoted and allows the general public to ask any questions they may have.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://www.linkedin.com/company/cloud-span
 
Description Creation of the Cloud-SPAN Twitter Account 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A Cloud-SPAN Twitter Account was created to help provide an online presence for the Cloud-SPAN project. It also enables the following:
1. Promote the Cloud-SPAN project to an international online audience
2. Promote the registration of various activities
3. Host News stories and blogs
4. Promote information regarding scholarships

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://twitter.com/SpanCloud
 
Description Creation of the Cloud-SPAN online Forum 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In our Cloud-SPAN community we encourage everyone to come together to find solutions to problems and exchange experiences and knowledge. Our aim is to build a friendly and involved community of people who have used our resources, are interested in our resources, or who have expertise in the areas we cover. Ways to contribute include attending one of our courses, asking/answering questions on our community forum and making suggestions for improvements to our courses.

The Cloud-SPAN forum is a place to ask questions, pick people's brains and share any insights you've gained during or after one of our courses. It will be the main hub of the Cloud-SPAN community of practice. We strongly encourage you to engage with the Cloud-SPAN community to enhance your learning and understanding.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities. It also provides an opportunity for audience to learn and develop their knowledge and skills.
Year(s) Of Engagement Activity 2021
URL https://cloudspan.peerboard.com/
 
Description Creation of the Cloud-SPAN website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A Cloud-SPAN website was created to help provide an online presence for the Cloud-SPAN project. It also enables the following:
1. Promote the Cloud-SPAN project to an international online audience
2. Promote and organise registration of various activities
3. Host News stories and blogs
4. Promote information regarding scholarships
5. Provides a platform for individuals to ask for further information or ask any questions

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://cloud-span.york.ac.uk/
 
Description EBNet Webinar: Using Big Data Approaches to Understand Microbial Communities 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact EBNet Webinar: Using Big Data Approaches to Understand Microbial Communities
Thursday, 10th February 2022 at 13.00 - 14.15.
The SESSION RECORDING is available here https://www.youtube.com/watch?v=1QH0JK0X0Xw

EBNet are hosting a series of specialist webinars to support knowledge exchange amongst members. "Using Big Data Approaches to Understand Microbial Communities". Hear the latest developments from top speakers and participate in the online chat to engage with questions.

This fascinating session is brought to you by the Chairs: Dr Sarah Forrester, the Chong Group, Dept. of Biology, University of York & Dr Bing Guo.

Dr Sarah Forrester is a PDRA within James Chong's group within the Biology department at the University of York. She gained her PhD at the University of Liverpool in 2016 using multi 'omic approaches to analyse parasite genomic data, and has worked since then on a range of microbial systems and used a variety of bioinformatic methods. She performs HPC driven microbial genomics research and delivers bioinformatics training. As a 2022 Software Sustainability fellow and a certified Software Carpentry instructor, she is passionate about instilling good bioinformatic practises into her training. She is also involved in the preparation and delivery of the material for Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing , see https://cloud-span.york.ac.uk/.

TALK TITLE: INTRODUCTION TO THE EBNET BIOINFORMATICS WORKING GROUP
Prof James Chong is a Royal Society Industry Fellow and Professor in the Department of Biology at the University of York, where he runs a research group exploiting a range of 'omics techniques to understand microbial community dynamics, as well as leading the EBNet Working Group "Bioinformatics Training for Microbial Environmental Biotechnologies". His group is involved in generating microbial community metagenomics, meta-transcriptomics and metabolomics datasets. His group use established analytical pipelines, but also develop their own bespoke scripts for data analysis. Insight into the application of 'omics techniques, and the ways in which they can be applied to environmental biotechnology use cases to greater understand microbial community dynamics, has driven his desire to develop bioinformatic training resources. This is currently being supported by the UKRI Grant Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing, and is co-led by James, see https://cloud-span.york.ac.uk/.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2022
URL https://ebnet.ac.uk/ebnet-rc22-bigdata/
 
Description EBNet Webinar: Why Bioinformatics Training is Important Emma Rand presented a talk on Cloud-SPAN - 11 May 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This seminar focused on why Bioinformatics Training is important a general introduction was delivered by Prof. James Chong:
Introduction "Why bioinformatics training is important": Prof. James Chong, University of York (Head of the Bioinformatics Working Group)

Emma Rand is a Senior Lecturer in the Department of Biology at the University of York where she specialises in teaching data science and reproducibility, particularly to those who do not see themselves as programmers. Delivered a presentation on the training opportunities and resources which are provided by the Cloud-SPAN project. This promoted all activities which were open for registration; along with the website, LinkedIN and Twitter accounts.

Link to event https://ebnet.ac.uk/wgbioinf-110522/
Year(s) Of Engagement Activity 2022
 
Description EBNet Working Group Coordinator 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact James Chong is the Working Group Chair for EBnet. This WG aims to create Bioinformatics training for microbial Environmental Biotechnologies. In this role James is able to make new connections and publicise the work of Cloud-SPAN.
Year(s) Of Engagement Activity 2021
URL https://ebnet.ac.uk/about/wg-details/wg-bioinformatics/
 
Description European Biosolids and Bioresource Conference 2022. 22nd and 23rd of November in Birmingham James Chong and Sarah Forrester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact James Chong and Sarah Forrester delivered Bioinformatics session, 40 minute Q & A on metagenomics and bioinformatics and access to training

European Biosolids and Bioresource Conference 2022. 22nd and 23rd of November in Birmingham , ~40 people attended

13:35 - 13:50 Bioinformatics-based diagnostics for monitoring AD - Professor James Chong, University of York, UK
13:50 - 14:05 Using multi-omic approaches to understand the co-digestion of wheatstraw and sewage sludge - Dr. Sarah Forrester, Senior Post Doc, University of York, UK
Year(s) Of Engagement Activity 2022
URL https://european-biosolids.com/wp-content/uploads/2022/11/European-Biosolids-Bioresources-Conference...
 
Description Metagenomics online training course April 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A training course was organised online comprising online lectures, drop in help sessions, and a slack channel for support for 30 participants.

Following completion of this course, learners will be trained to :
explain the hierarchical structure of a file system and describe the files and file structure used in the course
explain what is meant by a working directory, a path and a relative path and write down paths that they will need for the course
start a Terminal (Mac) or Git Bash Terminal (Windows)
navigate a file system using the command line
log in to and exit their AWS instance (the cloud)
use common commands such as ls, pwd and cd, on the command line
know the difference between genomics and metagenomics
describe the steps in a metagenomic workflow
perform quality control on reads and assemble them into a metagenome
perform polishing to improve an assembly
use binning to separate the metagenome into different species or MAGs (Metagenome-Assembled Genomes)
use Kraken 2 to assign taxonomy to reads and contigs and phyloseq in R to analyse taxonomic diversity
Year(s) Of Engagement Activity 2023
URL https://sites.google.com/york.ac.uk/cloud-span/train-with-us/specialised-skills#h.jqgzsdc8hbla
 
Description Metagenomics online training course February 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A training course was organised online comprising online lectures, drop in help sessions, and a slack channel for support for 30 participants.

Following completion of this course, learners will be trained to: explain the hierarchical structure of a file system and describe the files and file structure used in the course explain what is meant by a working directory, a path and a relative path and write down paths that they will need for the course start a Terminal (Mac) or Git Bash Terminal (Windows) navigate a file system using the command line log in to and exit their AWS instance (the cloud) use common commands such as ls, pwd and cd, on the command line know the difference between genomics and metagenomics describe the steps in a metagenomic workflow perform quality control on reads and assemble them into a metagenome perform polishing to improve an assembly use binning to separate the metagenome into different species or MAGs (Metagenome-Assembled Genomes) use Kraken 2 to assign taxonomy to reads and contigs and phyloseq in R to analyse taxonomic diversity.
Year(s) Of Engagement Activity 2024
URL https://cloud-span.github.io/nerc-metagenomics00-overview/
 
Description Metagenomics self-study cohort - April 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A self-study option was created for a small cohort of 20 learners to study the Metagenomics module, their learning was supported with drop in help sessions and a slack channel.

Following completion of this course, learners will be trained to :
explain the hierarchical structure of a file system and describe the files and file structure used in the course explain what is meant by a working directory, a path and a relative path and write down paths that they will need for the course start a Terminal (Mac) or Git Bash Terminal (Windows) navigate a file system using the command line log in to and exit their AWS instance (the cloud) use common commands such as ls, pwd and cd, on the command line know the difference between genomics and metagenomics describe the steps in a metagenomic workflow perform quality control on reads and assemble them into a metagenome perform polishing to improve an assembly use binning to separate the metagenome into different species or MAGs (Metagenome-Assembled Genomes) use Kraken 2 to assign taxonomy to reads and contigs and phyloseq in R to analyse taxonomic diversity
Year(s) Of Engagement Activity 2023
URL https://cloud-span.github.io/nerc-metagenomics00-overview/
 
Description Online Training Course: Genomics November 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The online training course on Genomics was delivered by the following members of the project team; Emma Rand, Jorge Buenabad-Chavez, Sarah Forrester, Evelyn Greeves, and Annabel Cansdale. The course was delivered over 4 half days to 26 UK-based participants.

Expected learning outcomes - by the end of the training course participants were able to:
• structure their data and metadata and plan for an NGS project
• organise and document genomics data and bioinformatics workflows
• understand what information is needed by a sequencing facility
• gain practice navigating file systems, creating, copying, moving, and removing files and directories
• use command-line tools to assess read quality and perform quality control
• align reads to a reference genome, and identify and visualise sequence variants
• work with Amazon AWS cloud computing and transfer data between a local computer and cloud resources

Feedback from participants was very positive and many stated that they felt their abilities had improved after attending the course, as highlighted in this blog post.

Impact: provides an opportunity for the learner to develop their knowledge and skills.
Year(s) Of Engagement Activity 2021
URL https://cloud-span.github.io/genomics01-intro/
 
Description Online Training Course: Prenomics March 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The online training course on Prenomics was delivered by the following members of the project team; Emma Rand, Jorge Buenabad-Chavez, and Evelyn Greeves. The course was delivered over 2 half days to 28 UK-based participants.

The Prenomics module is designed to prepare people for the Cloud-SPAN Genomics module . We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed the Prenomics module to allow more time for those with less experience to cover some foundation concepts. We have a Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module. The Prenomics and Genomics modules are based on the Data Carpentry's Genomics Workshop. Prenomics teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing.

Impact: allows participants to develop their skills and knowledge in this area.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/prenomics00-intro/
 
Description Participant on Open Life Science Mentorship Programme 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Team member Evelyn Greeves, is a participant on the OLS mentorship programme. This allows her to widen her expertise in the area of establishing and maintaining an online community.

Impact: via the OLS network the work of Cloud-SPAN can be publicised.
Year(s) Of Engagement Activity 2022
URL https://openlifesci.org/ols-5/projects-participants/
 
Description Participation in an activity, workshop or similar - Online Training Course: Metagenomics - 31 October 2022 - 25 November 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Participation in an activity, workshop or similar - Online Training Course: Metagenomics - 31 October 2022 - 25 November 2022
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/metagenomics00-overview/
 
Description Participation in an activity, workshop or similar - Training Course: Genomics - 6-7 December 2022 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Training course for 14 participants from 6 different institutions. Participants completed the interactive workshop and developed practical skills and increased their knowledge in the area of data management and analytical skills for genomic research. All participants connected to the Cloud-SPAN community via the slack channel and in person.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/00genomics/
 
Description Prenomics online workshop - 22-23 November 2022 - Evelyn Greeves 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Cloud-SPAN is a collaboration between the Department of Biology at the University of York and The Software Sustainability Institute funded by the UKRI innovation scholars award. It aims to train researchers to effectively generate and analyse a range of 'omics data using Cloud computing resources.

This Prenomics module is designed to prepare people for the Cloud-SPAN Genomics module. We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed the Prenomics module to allow more time for those with less experience to cover some foundation concepts. We have a Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module.

Prenomics teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing.

18 participants attended the session.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/prenomics00-intro/
 
Description Prenomics online workshop - December 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact This Prenomics module is designed to prepare people for the Cloud-SPAN Genomics module.

We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed the Prenomics module to allow more time for those with less experience to cover some foundation concepts. We have a Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module.

Prenomics teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing. 18 participants attended the session.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.github.io/prenomics00-intro/
 
Description Presentation at University of York's Head of Department Meeting 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Projects Leads, Emma Rand and James Chong, delivered an informative presentation which covered an overview of the project including; goals, strategy and training resources. This talk generated new registrations for the Prenomics and Genomics training courses and allowed questions to be addressed from the general public.
Year(s) Of Engagement Activity 2022
URL https://drive.google.com/file/d/1pO-DXIR3p8XncrvGxlf5KLBiRPQYJfxp/view?usp=sharing
 
Description Presentation at University of York's Open Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact An open day was hosted at the University of York, Emma Rand the Co-Project Lead delivered an informative presentation on the goals of the Cloud-SPAN project. This allowed individuals to ask any questions regarding the project and helped to promote registrations for the Cloud-SPAN activities.
Year(s) Of Engagement Activity 2021
 
Description Presentation: Cloud Span: a use case of FAIR implementation on HPC training materials - Evelyn Greeves, Cloud Span, University of York 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Online workshops: FAIR data in the life sciences: beyond theory 19/10/2022 @ 1:00 pm - 3:30 pm.

Presentation: Cloud Span: a use case of FAIR implementation on HPC training materials - Evelyn Greeves, Cloud Span, University of York
Year(s) Of Engagement Activity 2022
URL https://www.ukrn.org/event/fair-data-life-sciences-oct-2022/
 
Description Presenting a Prenomics poster at the Biology Research Away Day 2022 - Wednesday 7 September 2022 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact At the Research day there were approxiately 100+ people as it was a whole department event.

Aim of the event: Today brings a well-deserved celebration of the excellent research we do and a key opportunity to engage in the great wealth of research diversity within our department. It is this diversity that inspires new collaboration, supports new ideas and promotes exciting initiatives. Each of us deeply appreciates the creative exploration of Bioscience that touches, motivates and drives progress in our fields. Our department heralds its interdisciplinarity with well-established, world-class centres such as CNAP, YSBL, YCCSA and YESI, to newer inspired institutions including YBRI, PoL and LCAB. Our Biosciences Technology Facility is the envy of the Russell Group with award-winning experts championing innovation, training and promoting our scientific potential. Our Research-led Teaching strengthens and builds our department while our dedication to balance and excel at both Research and Teaching consistently ranks us Top 10 in the UK.
Year(s) Of Engagement Activity 2022
URL https://wiki.york.ac.uk/display/DeptBiol/RAD2022
 
Description Self-study Module - Create your own AWS module 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Cloud-SPAN is a project run by the Department of Biology at the University of York with the aim to training researchers in the experimental design and analysis of 'omics data using cloud-based High Performance Computing (HPC) resources.

This course teaches how to create and manage your own Cloud-SPAN Amazon Web Services (AWS) instance, which is a Linux virtual machine configured with 'omics data and software analysis tools. The instance you will create is the same instance that is used in the Cloud-SPAN courses Prenomics and Genomics.

If you attend tutor-led editions of Cloud-SPAN's Prenomics and Genomics courses you do not need to create your own instance. We will do that for you! But if would like to practice afterwards, or study the courses in your own time, you will need to create an instance first.

You will learn (1) how to open and configure your AWS account, which will enable you to use any AWS service; (2) how to create and manage (start, stop and terminate) your instance; and (3) the cost of using your instance.

The course is designed for 2-3 hours of self-study.
Year(s) Of Engagement Activity 2022,2023
URL https://cloud-span.github.io/create-aws-instance-0-overview/
 
Description Statistically useful experimental design - April 2023 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Experimental design is critical for 'omics experiments in order to generate data capable of addressing your research questions and control your reagent costs. There are choices to be made about sample preparation and storage, sequencing technologies, the numbers of technical and biological replicates and sequencing depth. The most appropriate choices depend on the type of research question you have, the strengths and weaknesses of the platform and the biological variability.

In this half-day workshop different case-studies are examined in detail to discuss some of the most important aspects that need to be taken into account to design and perform experiments that generate the reproducible, high-quality data you need. Participants also had an opportunity to discuss their own experimental designs and develop their own skills.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.github.io/experimental_design00-overview/
 
Description Training Course: Genomics - January 2024 - Online 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Training course for 17 participants from 12 different institutions. Participants completed the online interactive workshop and developed practical skills and increased their knowledge in the area of data management and analytical skills for genomics research. All participants connected to the Cloud-SPAN community via the slack channel.
Year(s) Of Engagement Activity 2024
URL https://cloud-span.github.io/00genomics/
 
Description Training course: Automated Management of AWS Instances 31 Jan 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact 15 members of the staff from 2 different departments.

Learning objective

Cloud-SPAN is a project run by the Biology Department at the University of York with the aim to training researchers in the experimental design and analysis of 'omics data using cloud-based High Performance Computing (HPC) resources.

This course teaches how to automatically manage multiple Amazon Web Services (AWS) instances - each instance being a Linux virtual machine. Using Bash Shell scripts, it is shown how to create, stop, start and delete one or multiple instances with a single invokation of a script.

We use the scripts to manage multiple instances within the Cloud-SPAN project for hands-on training purposes. When running a workshop, a number of instances is created with relevant 'omics data and software analysis tools that are relevant to the workshop. Each student is granted exclusive access to one instance through the use of an encrypted login key.

The scripts receive as input only the names of the instances to create, delete, etc. Login keys, IP addresses, and domain names used by instances are created on demand on creating the instances, and deleted likewise on deleting the instances. Creating over 30 instances takes 10-15 minutes.

The target audience of the course is anyone in charge of, or interested in, deploying and managing cloud resources. While the course is focused on AWS, and particularly Elactic Compute Cloud (EC2) instances, the scripts can be adapted for use with other cloud providers and other types of cloud services.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.github.io/cloud-admin-guide-0-overview/
 
Description Training course: Statistically useful experimental design - 22 September 2022 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Experimental design is critical for 'omics experiments in order to generate data capable of addressing your research questions and control your reagent costs. There are choices to be made about sample preparation and storage, sequencing technologies, the numbers of technical and biological replicates and sequencing depth. The most appropriate choices depend on the type of research question you have, the strengths and weaknesses of the platform and the biological variability.

In this half-day workshop we considered case-studies in detail to discuss some of the most important aspects that need to be taken into account to design and perform experiments that generate the reproducible, high-quality data you need. Participants also had an opportunity to discuss their own experimental designs and develop their own skills.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/experimental_design00-overview/
 
Description UK Conference for Bioinformatics and Computational Biology talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The UK Conference of Bioinformatics and Computational Biology 2021 brings together biologists, bioinformaticians, computer scientists, software engineers and data scientists across the life sciences, to share innovations, applications and best practice in their fields.
We took part in a workshop session for UKRI Innovation Scholars - Data Science Training in Health and Bioscience, for all the projects awarded as part of this UKRI grant call to hear about the training they are developing in data for life scientists. This session was relevant to those working in life science data who wanted to learn more about the future of training, and was especially relevant to people who already run training in data science in the areas of health and bioscience.
This allowed networking with potential particpaints of Cloud-SPAN training and with those able to publicise and promote the our project
Year(s) Of Engagement Activity 2021
URL https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21#Programme-5
 
Description UKRI DaSH workshop for all grant holders April 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact We have planned a meeting for all grant holders for the DaSH funded projects to assemble at a half day meeting in York. With the following objectives:
- Discuss best practice when running training projects funded on short term grants
- Promote Cloud-SPAN activities
- Network with similar project to understand how activities can be sustained in the long-term
Year(s) Of Engagement Activity 2023