Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing

Lead Research Organisation: University of York
Department Name: Biology

Abstract

Environmental Biotechnology (EB) addresses global challenges using engineered microbial systems for environmental protection, bio-remediation and resource recovery. It is a critical and expanding area for the UK and underpins some of the world's most important industries. This is acknowledged by the funding invested in the creation of BBSRC's Networks in Industrial Biotechnology and Bioenergy (NIBBs).
A deep mechanistic understanding of the complex microbial communities involved in the biological cycling of global resources is essential to meet global challenges such as Net Zero, waste management and increased demand. The complexity of these microbiomes can be orders of magnitude larger than those found in the human gut, requiring different approaches to experimental design and analysis with High Performance Computing (HPC). However, EB is an interdisciplinary field that attracts researchers from a broad range of disciplines including Mathematics, Engineering, Biology, Social Sciences, Management, Physics and Chemistry and big data 'omics analyses on HPC systems are often not core skills for such researchers.
This proposal aims to develop and deliver highly accessible resources that will upskill these interdisciplinary researchers so that they are able to generate and analyze big data relating to EB using Cloud HPC. Although infrastructure and resources exist for microbial 'omics (e.g. JGI's IMG/M, MG-RAST, Galaxy, CLIMB, EBI) there is a lack of systematic training tightly linked to the EB domain and documentation is often focussed on technical proficiency rather than contextualised with a strong understanding of experimental design. We will provide foundational training and develop and deliver new advanced modules covering the specialised skills required to generate and analyse 'omics data using Cloud HPC resources. These will include experimental design and statistical modules to ensure researchers can generate data appropriate to investigate their research question. Modules will deploy cloud-based containerised instances provided by Google Education and Amazon Web Services (AWS) for exemplar workflows free to the learner. They will form a complete training resource with fully articulated prerequisites and learning objectives that can be used for in-person or online tutor-led workshops or self-paced learning. Our proposal offers structured Learning Paths from the statistical skills required for robust experimental design through to the reproducible execution and interpretation of 'omics analyses with HPC to cater to researchers with differing levels of previous experience and which allow self-assessment of training needs. We also provide Diversity Scholarships to enable members of underrepresented groups to participate in online or in-person training.
The collaboration between the University of York and the Software Sustainability Institute (SSI) brings together excellence in data science pedagogy and environmental 'omics research with the SSI's UK-leading expertise in research computing and community building. This will ensure the training developed genuinely complements, and aligns with, existing materials to enhance national provision.
Sustainability will be fostered by making the resources Findable, Accessible, Interoperable and Reusable (FAIR), providing cross-platform images for deployment and by developing and proactively engaging with a Community of Practice and providing Code Retreats for the supported practice of methods to participants' own data. In addition, "Cloud Administration Guides" will be developed for institutional HPC Teams to run specialised modules with their own resources. These Guides will be supported 1-to-2 day training by Cloud-SPAN systems administrators.
The project will be promoted by our partners, the SSI, Google Education, AWSand the N8 Centre of Excellence in Computationally Intensive Research and through delivery of conference talks and seminars.

Technical Summary

Environmental Biotechnology (EB) is an interdisciplinary field involving advanced molecular and applied microbiologists, environmental chemists and engineers that addresses global challenges using engineered microbial systems. This proposal aims to trainer these researchers to generate, analyse and mine big data relating to EB microbiomes which are larger than those found in the human gut and require different approaches to both measurement and analysis in order to manage reagent costs and effectively leverage available HPC resources.
Easily accessible and scalable HPC-based training is required to provide researchers with the skills and self-confidence to manipulate and analyse big data generated from 'omics technologies and generate biological insights from these highly interconnected systems. This area of bioinformatics involves a steep learning curve which can be confounded by the need to install packages with multiple dependencies onto different HPC architectures based on what is available at a researcher's home institution, even before the user can engage with writing scripts to manage workflows, manipulate or visualise data, or manage job schedulers. We will deploy Cloud-based containerised instances which are (1) accessible to anyone anywhere as long as they have an adequate internet connection, (2) have a very low hardware entry requirement and (3) allow for easily scalable and replicable installations of software that will not become deprecated as quickly as might occur on a local server. The cloud providers we are working with run grant schemes that provide significant resources to researchers that will support deployment of production instances of the images we will generate. We expect that our resources will be easily deployed and used by groups who do not necessarily have devoted bioinformaticians or expertise in HPC, providing a cost-effective route to useful analyses for researchers in a strategically key area of expansion in the biosciences.

Publications

10 25 50
 
Description Collaboration with Software Sustainability Institute 
Organisation Software Sustainability Institute
Country United Kingdom 
Sector Public 
PI Contribution The project team participates in quarterly management meeting with Neil Chue Hong, Director of the Software Sustainability Institute.
Collaborator Contribution Neil Chue Hong, Director of the Software Sustainability Institute participates in a quarterly management meeting to provide guidance and strategic advice on different aspects of the project.
Impact During our meetings we are able to discuss best practice in regards to; creating training materials, managing educational activities, engagement of the general public.
Start Year 2021
 
Description EBnet Collaboration 
Organisation UK Environmental Biotechnology Network
Country United Kingdom 
Sector Academic/University 
PI Contribution James Chong and Sarah Forrester are active members within the EBnet Working Group. Through their work with EBnet they are able to publicise the Cloud-SPAN project, via sharing information and delivering talks at webinars.
Collaborator Contribution EBnet support and promote the training opportunities created through the Cloud-SPAN project. The collaboration also allows members to exchange expertise in the field of HPC driven microbial genomics research, which in turn improves the quality of the Cloud-SPAN training resources.
Impact James Chong and Sarah Forrester are active members within the EBnet Working Group. Through their work with EBnet they are able to publicise the Cloud-SPAN project, via sharing information and delivering talks at webinars.
Start Year 2021
 
Title Web-based App - Self-Assessment Quiz 
Description Using the Shiny an R package, an online interactive web-based app was created in order to evaluate the level of competence of a participant. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact This online self-assessment tool has been invaluable to determine the level of competence of participants; based on the results course participants can continue their registration to either an introductory course or an advanced course. 
URL https://shiny.york.ac.uk/er13/prenomics-quiz/#section-why
 
Description Blog re Genomics Course November 2021 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A blog was written evaluating the first Genomics Course which was delivered in November 2021. The blog was posted on the Cloud-SPAN forum and included on the SSI's website https://software.ac.uk/news/review-cloud-spans-genomics-course
Year(s) Of Engagement Activity 2021
URL https://cloudspan.peerboard.com/post/1021906833
 
Description Code Retreat - April 2022, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces.

Learning outcomes from the event:
Working with your peers and with help from our instructors, you could:
- Revise our Prenomics or Genomics courses
- Get help organising and documenting your own analysis
- Apply tools taught in Genomics to your own data
- Get help with Creating your own Amazon Web Services instance for Genomics
- Network with other genomics researchers
Year(s) Of Engagement Activity 2022
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Code Retreat - January 2023, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could: - Revise our Prenomics or Genomics courses - Get help organising and documenting your own analysis - Apply tools taught in Genomics to your own data - Get help with Creating your own Amazon Web Services instance for Genomics - Network with other genomics researchers
Year(s) Of Engagement Activity 2023
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Code Retreat - November 2022, University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The Code Retreat brings small communities of practice together to explore, learn and grow. This event helps broaden the knowledge and practical experience of the participants. They also have the network with individuals from different institutions and workplaces. Learning outcomes from the event: Working with your peers and with help from our instructors, you could: - Revise our Prenomics or Genomics courses - Get help organising and documenting your own analysis - Apply tools taught in Genomics to your own data - Get help with Creating your own Amazon Web Services instance for Genomics - Network with other genomics researchers
Year(s) Of Engagement Activity 2022
URL https://cloud-span.york.ac.uk/community#h.ma5203jptwz0
 
Description Creation of the Cloud-SPAN Handbook 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In our Cloud-SPAN community we encourage everyone to come together to find solutions to problems and exchange experiences and knowledge. Our aim is to build a friendly and involved community of people who have used our resources, are interested in our resources, or who have expertise in the areas we cover. Ways to contribute include attending one of our courses, asking/answering questions on our community forum and making suggestions for improvements to our courses.

Handbook
This handbook is intended as a reference for both the core Cloud-SPAN team and for our wider community of learners. It's where you'll find our Code of Conduct, contributing guidelines and other practical information which will help you make the most of our resources in a friendly, understanding environment.
Year(s) Of Engagement Activity 2021
URL https://cloud-span.github.io/CloudSPAN-handbook/index.html
 
Description Creation of the Cloud-SPAN LinkedIn Account 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A Cloud-SPAN account was created to help provide an online presence for the Cloud-SPAN project. Via LinkedIn the different training activities are promoted and allows the general public to ask any questions they may have.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://www.linkedin.com/company/cloud-span
 
Description Creation of the Cloud-SPAN Twitter Account 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A Cloud-SPAN Twitter Account was created to help provide an online presence for the Cloud-SPAN project. It also enables the following:
1. Promote the Cloud-SPAN project to an international online audience
2. Promote the registration of various activities
3. Host News stories and blogs
4. Promote information regarding scholarships

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://twitter.com/SpanCloud
 
Description Creation of the Cloud-SPAN online Forum 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In our Cloud-SPAN community we encourage everyone to come together to find solutions to problems and exchange experiences and knowledge. Our aim is to build a friendly and involved community of people who have used our resources, are interested in our resources, or who have expertise in the areas we cover. Ways to contribute include attending one of our courses, asking/answering questions on our community forum and making suggestions for improvements to our courses.

The Cloud-SPAN forum is a place to ask questions, pick people's brains and share any insights you've gained during or after one of our courses. It will be the main hub of the Cloud-SPAN community of practice. We strongly encourage you to engage with the Cloud-SPAN community to enhance your learning and understanding.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities. It also provides an opportunity for audience to learn and develop their knowledge and skills.
Year(s) Of Engagement Activity 2021
URL https://cloudspan.peerboard.com/
 
Description Creation of the Cloud-SPAN website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A Cloud-SPAN website was created to help provide an online presence for the Cloud-SPAN project. It also enables the following:
1. Promote the Cloud-SPAN project to an international online audience
2. Promote and organise registration of various activities
3. Host News stories and blogs
4. Promote information regarding scholarships
5. Provides a platform for individuals to ask for further information or ask any questions

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2021
URL https://cloud-span.york.ac.uk/
 
Description EBNet Webinar: Using Big Data Approaches to Understand Microbial Communities 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact EBNet Webinar: Using Big Data Approaches to Understand Microbial Communities
Thursday, 10th February 2022 at 13.00 - 14.15.
The SESSION RECORDING is available here https://www.youtube.com/watch?v=1QH0JK0X0Xw

EBNet are hosting a series of specialist webinars to support knowledge exchange amongst members. "Using Big Data Approaches to Understand Microbial Communities". Hear the latest developments from top speakers and participate in the online chat to engage with questions.

This fascinating session is brought to you by the Chairs: Dr Sarah Forrester, the Chong Group, Dept. of Biology, University of York & Dr Bing Guo.

Dr Sarah Forrester is a PDRA within James Chong's group within the Biology department at the University of York. She gained her PhD at the University of Liverpool in 2016 using multi 'omic approaches to analyse parasite genomic data, and has worked since then on a range of microbial systems and used a variety of bioinformatic methods. She performs HPC driven microbial genomics research and delivers bioinformatics training. As a 2022 Software Sustainability fellow and a certified Software Carpentry instructor, she is passionate about instilling good bioinformatic practises into her training. She is also involved in the preparation and delivery of the material for Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing , see https://cloud-span.york.ac.uk/.

TALK TITLE: INTRODUCTION TO THE EBNET BIOINFORMATICS WORKING GROUP
Prof James Chong is a Royal Society Industry Fellow and Professor in the Department of Biology at the University of York, where he runs a research group exploiting a range of 'omics techniques to understand microbial community dynamics, as well as leading the EBNet Working Group "Bioinformatics Training for Microbial Environmental Biotechnologies". His group is involved in generating microbial community metagenomics, meta-transcriptomics and metabolomics datasets. His group use established analytical pipelines, but also develop their own bespoke scripts for data analysis. Insight into the application of 'omics techniques, and the ways in which they can be applied to environmental biotechnology use cases to greater understand microbial community dynamics, has driven his desire to develop bioinformatic training resources. This is currently being supported by the UKRI Grant Cloud-SPAN: Specialised analyses for environmental 'omics with Cloud-based High Performance Computing, and is co-led by James, see https://cloud-span.york.ac.uk/.

Impact: enables the dissemination of the project details to a wider audience and generates registrations to current activities
Year(s) Of Engagement Activity 2022
URL https://ebnet.ac.uk/ebnet-rc22-bigdata/
 
Description EBNet Webinar: Why Bioinformatics Training is Important Emma Rand presented a talk on Cloud-SPAN - 11 May 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This seminar focused on why Bioinformatics Training is important a general introduction was delivered by Prof. James Chong:
Introduction "Why bioinformatics training is important": Prof. James Chong, University of York (Head of the Bioinformatics Working Group)

Emma Rand is a Senior Lecturer in the Department of Biology at the University of York where she specialises in teaching data science and reproducibility, particularly to those who do not see themselves as programmers. Delivered a presentation on the training opportunities and resources which are provided by the Cloud-SPAN project. This promoted all activities which were open for registration; along with the website, LinkedIN and Twitter accounts.

Link to event https://ebnet.ac.uk/wgbioinf-110522/
Year(s) Of Engagement Activity 2022
 
Description EBNet Working Group Coordinator 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact James Chong is the Working Group Chair for EBnet. This WG aims to create Bioinformatics training for microbial Environmental Biotechnologies. In this role James is able to make new connections and publicise the work of Cloud-SPAN.
Year(s) Of Engagement Activity 2021
URL https://ebnet.ac.uk/about/wg-details/wg-bioinformatics/
 
Description European Biosolids and Bioresource Conference 2022. 22nd and 23rd of November in Birmingham James Chong and Sarah Forrester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact James Chong and Sarah Forrester delivered Bioinformatics session, 40 minute Q & A on metagenomics and bioinformatics and access to training

European Biosolids and Bioresource Conference 2022. 22nd and 23rd of November in Birmingham , ~40 people attended

13:35 - 13:50 Bioinformatics-based diagnostics for monitoring AD - Professor James Chong, University of York, UK
13:50 - 14:05 Using multi-omic approaches to understand the co-digestion of wheatstraw and sewage sludge - Dr. Sarah Forrester, Senior Post Doc, University of York, UK
Year(s) Of Engagement Activity 2022
URL https://european-biosolids.com/wp-content/uploads/2022/11/European-Biosolids-Bioresources-Conference...
 
Description Online Training Course: Genomics November 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The online training course on Genomics was delivered by the following members of the project team; Emma Rand, Jorge Buenabad-Chavez, Sarah Forrester, Evelyn Greeves, and Annabel Cansdale. The course was delivered over 4 half days to 26 UK-based participants.

Expected learning outcomes - by the end of the training course participants were able to:
• structure their data and metadata and plan for an NGS project
• organise and document genomics data and bioinformatics workflows
• understand what information is needed by a sequencing facility
• gain practice navigating file systems, creating, copying, moving, and removing files and directories
• use command-line tools to assess read quality and perform quality control
• align reads to a reference genome, and identify and visualise sequence variants
• work with Amazon AWS cloud computing and transfer data between a local computer and cloud resources

Feedback from participants was very positive and many stated that they felt their abilities had improved after attending the course, as highlighted in this blog post.

Impact: provides an opportunity for the learner to develop their knowledge and skills.
Year(s) Of Engagement Activity 2021
URL https://cloud-span.github.io/genomics01-intro/
 
Description Online Training Course: Prenomics March 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The online training course on Prenomics was delivered by the following members of the project team; Emma Rand, Jorge Buenabad-Chavez, and Evelyn Greeves. The course was delivered over 2 half days to 28 UK-based participants.

The Prenomics module is designed to prepare people for the Cloud-SPAN Genomics module . We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed the Prenomics module to allow more time for those with less experience to cover some foundation concepts. We have a Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module. The Prenomics and Genomics modules are based on the Data Carpentry's Genomics Workshop. Prenomics teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing.

Impact: allows participants to develop their skills and knowledge in this area.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/prenomics00-intro/
 
Description Participant on Open Life Science Mentorship Programme 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Team member Evelyn Greeves, is a participant on the OLS mentorship programme. This allows her to widen her expertise in the area of establishing and maintaining an online community.

Impact: via the OLS network the work of Cloud-SPAN can be publicised.
Year(s) Of Engagement Activity 2022
URL https://openlifesci.org/ols-5/projects-participants/
 
Description Participation in an activity, workshop or similar - Online Training Course: Metagenomics - 31 October 2022 - 25 November 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Participation in an activity, workshop or similar - Online Training Course: Metagenomics - 31 October 2022 - 25 November 2022
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/metagenomics00-overview/
 
Description Participation in an activity, workshop or similar - Training Course: Genomics - 6-7 December 2022 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Training course for 14 participants from 6 different institutions. Participants completed the interactive workshop and developed practical skills and increased their knowledge in the area of data management and analytical skills for genomic research. All participants connected to the Cloud-SPAN community via the slack channel and in person.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/00genomics/
 
Description Prenomics online workshop - 22-23 November 2022 - Evelyn Greeves 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Cloud-SPAN is a collaboration between the Department of Biology at the University of York and The Software Sustainability Institute funded by the UKRI innovation scholars award. It aims to train researchers to effectively generate and analyse a range of 'omics data using Cloud computing resources.

This Prenomics module is designed to prepare people for the Cloud-SPAN Genomics module. We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed the Prenomics module to allow more time for those with less experience to cover some foundation concepts. We have a Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module.

Prenomics teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing.

18 participants attended the session.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/prenomics00-intro/
 
Description Presentation at University of York's Head of Department Meeting 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Projects Leads, Emma Rand and James Chong, delivered an informative presentation which covered an overview of the project including; goals, strategy and training resources. This talk generated new registrations for the Prenomics and Genomics training courses and allowed questions to be addressed from the general public.
Year(s) Of Engagement Activity 2022
URL https://drive.google.com/file/d/1pO-DXIR3p8XncrvGxlf5KLBiRPQYJfxp/view?usp=sharing
 
Description Presentation at University of York's Open Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact An open day was hosted at the University of York, Emma Rand the Co-Project Lead delivered an informative presentation on the goals of the Cloud-SPAN project. This allowed individuals to ask any questions regarding the project and helped to promote registrations for the Cloud-SPAN activities.
Year(s) Of Engagement Activity 2021
 
Description Presentation: Cloud Span: a use case of FAIR implementation on HPC training materials - Evelyn Greeves, Cloud Span, University of York 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Online workshops: FAIR data in the life sciences: beyond theory 19/10/2022 @ 1:00 pm - 3:30 pm.

Presentation: Cloud Span: a use case of FAIR implementation on HPC training materials - Evelyn Greeves, Cloud Span, University of York
Year(s) Of Engagement Activity 2022
URL https://www.ukrn.org/event/fair-data-life-sciences-oct-2022/
 
Description Presenting a Prenomics poster at the Biology Research Away Day 2022 - Wednesday 7 September 2022 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact At the Research day there were approxiately 100+ people as it was a whole department event.

Aim of the event: Today brings a well-deserved celebration of the excellent research we do and a key opportunity to engage in the great wealth of research diversity within our department. It is this diversity that inspires new collaboration, supports new ideas and promotes exciting initiatives. Each of us deeply appreciates the creative exploration of Bioscience that touches, motivates and drives progress in our fields. Our department heralds its interdisciplinarity with well-established, world-class centres such as CNAP, YSBL, YCCSA and YESI, to newer inspired institutions including YBRI, PoL and LCAB. Our Biosciences Technology Facility is the envy of the Russell Group with award-winning experts championing innovation, training and promoting our scientific potential. Our Research-led Teaching strengthens and builds our department while our dedication to balance and excel at both Research and Teaching consistently ranks us Top 10 in the UK.
Year(s) Of Engagement Activity 2022
URL https://wiki.york.ac.uk/display/DeptBiol/RAD2022
 
Description Self-study Module - Create your own AWS module 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Cloud-SPAN is a project run by the Department of Biology at the University of York with the aim to training researchers in the experimental design and analysis of 'omics data using cloud-based High Performance Computing (HPC) resources.

This course teaches how to create and manage your own Cloud-SPAN Amazon Web Services (AWS) instance, which is a Linux virtual machine configured with 'omics data and software analysis tools. The instance you will create is the same instance that is used in the Cloud-SPAN courses Prenomics and Genomics.

If you attend tutor-led editions of Cloud-SPAN's Prenomics and Genomics courses you do not need to create your own instance. We will do that for you! But if would like to practice afterwards, or study the courses in your own time, you will need to create an instance first.

You will learn (1) how to open and configure your AWS account, which will enable you to use any AWS service; (2) how to create and manage (start, stop and terminate) your instance; and (3) the cost of using your instance.

The course is designed for 2-3 hours of self-study.
Year(s) Of Engagement Activity 2022,2023
URL https://cloud-span.github.io/create-aws-instance-0-overview/
 
Description Training course: Automated Management of AWS Instances 31 Jan 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact 15 members of the staff from 2 different departments.

LO

Cloud-SPAN is a project run by the Biology Department at the University of York with the aim to training researchers in the experimental design and analysis of 'omics data using cloud-based High Performance Computing (HPC) resources.

This course teaches how to automatically manage multiple Amazon Web Services (AWS) instances - each instance being a Linux virtual machine. Using Bash Shell scripts, it is shown how to create, stop, start and delete one or multiple instances with a single invokation of a script.

We use the scripts to manage multiple instances within the Cloud-SPAN project for hands-on training purposes. When running a workshop, a number of instances is created with relevant 'omics data and software analysis tools that are relevant to the workshop. Each student is granted exclusive access to one instance through the use of an encrypted login key.

The scripts receive as input only the names of the instances to create, delete, etc. Login keys, IP addresses, and domain names used by instances are created on demand on creating the instances, and deleted likewise on deleting the instances. Creating over 30 instances takes 10-15 minutes.

The target audience of the course is anyone in charge of, or interested in, deploying and managing cloud resources. While the course is focused on AWS, and particularly Elactic Compute Cloud (EC2) instances, the scripts can be adapted for use with other cloud providers and other types of cloud services.
Year(s) Of Engagement Activity 2023
URL https://cloud-span.github.io/cloud-admin-guide-0-overview/
 
Description Training course: Statistically useful experimental design - 22 September 2022 - University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Experimental design is critical for 'omics experiments in order to generate data capable of addressing your research questions and control your reagent costs. There are choices to be made about sample preparation and storage, sequencing technologies, the numbers of technical and biological replicates and sequencing depth. The most appropriate choices depend on the type of research question you have, the strengths and weaknesses of the platform and the biological variability.

In this half-day workshop we considered case-studies in detail to discuss some of the most important aspects that need to be taken into account to design and perform experiments that generate the reproducible, high-quality data you need. Participants also had an opportunity to discuss their own experimental designs and develop their own skills.
Year(s) Of Engagement Activity 2022
URL https://cloud-span.github.io/experimental_design00-overview/
 
Description UK Conference for Bioinformatics and Computational Biology talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The UK Conference of Bioinformatics and Computational Biology 2021 brings together biologists, bioinformaticians, computer scientists, software engineers and data scientists across the life sciences, to share innovations, applications and best practice in their fields.
We took part in a workshop session for UKRI Innovation Scholars - Data Science Training in Health and Bioscience, for all the projects awarded as part of this UKRI grant call to hear about the training they are developing in data for life scientists. This session was relevant to those working in life science data who wanted to learn more about the future of training, and was especially relevant to people who already run training in data science in the areas of health and bioscience.
This allowed networking with potential particpaints of Cloud-SPAN training and with those able to publicise and promote the our project
Year(s) Of Engagement Activity 2021
URL https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21#Programme-5