Data driven life science skills development - equipping society for the future

Lead Research Organisation: University of Edinburgh
Department Name: MRC Human Genetics Unit

Abstract

Executive summary

We aim to develop and deliver workshops that help health and bioscience researchers - in academia, industry and society as a whole - to be both competent and confident in working with their data.

Why health and bioscience researchers need data science training

Biological and medical research has changed radically in the last 30 years due to new technologies that measure thousands of different molecular components in cells at the same time. For example, it is now routine to measure entire human genomes, all the proteins making up living cells, or DNA from all the microbes in a sample of soil. All this generates huge amounts of data that come in different formats and often at different times and places. So, today all biological researchers need to be good at managing and analyzing data - it is no longer the remit of the specialist. This is not just a UK challenge: international studies show the same trend. The demand for data science training far outstrips supply.

Life science industries are equally dependent on bioscience and health data, across pharmaceutical manufacturers, diagnostic providers, vaccine developers, and agricultural and environmental service providers. New moves towards precision medicine (drugs tailored to the patient) and precision agriculture (tailoring crop management) depend on access to, and accurate interpretation of, high quality data. Many careers in industry, government and society need good data management and analysis skills.

Importantly, the public needs confidence in data and data intensive research - for trust in the scientific process and for harnessing the benefits of data for science and society. Data sharing (Open Access Data) between researchers is important for scientific progress, for all-important reproducibility, and to derive best value for investment in publicly funded research.

In such a data-intensive environment for bioscience and health, it's important that everyone - whatever their career stage or role - can manage, analyse, store and share their data. This is what we hope to achieve through this project.

What we plan to do

We have focused training on areas where we know there is a particular need among health and bioscience researchers.
- Analyzing data - A good grounding in statistics is needed to analyze large and complex data sets, using modern methods such as machine learning.
- Managing data - Driving an understanding of how to move data securely around virtual 'storage' spaces in ways that information can be retrieved.
- Sharing data - Understanding and adhering to the FAIR principles (Findable-Accessible-Interoperable-Reproducible) ensures open access to data.
- Designing portable analysis - Writing complex analysis workflows in a manner that is easily transferred between different computing systems, so other researchers can use them too.

We will deliver these training workshops (online) using a well-established community platform called The Carpentries. This is an inclusive open-access platform that trains people in data and coding skills and encourages learners to become first helpers, and then trainers, as their own expertise develops. Open-access teaching materials mean that small improvements can be suggested every time a workshop is delivered, leading to a constant improvement in quality. It also means that anyone with an internet connection can use the materials for self-study, so work put into developing materials has wider impact. Edinburgh has the largest Carpentries affiliate in the UK, which is keen to extend the reach of its training.

Our programme will help level-up data skills across the UK and develop a growing cohort of confident practitioners across all career stages and industries. This will help meet the growing demand for data-savvy health and bioscience researchers in academia and industry.

Technical Summary

Our aim is to enable researchers at all career stages to harness the power of data-driven research and innovation for health and bioscience, by providing training in the management and analysis of biological data. We will address this by scaling our successful training programs, adding new curricula, and training new instructors. We will use open-source platforms developed by The Carpentries, a community-based project that is a global leader in teaching data and coding skills. The Carpentries paradigm of open access training leads participants on a skills development path, first as a learner, then helping an experienced instructor, finally leading instruction themselves. Open platforms such as The Carpentries add value by enabling open peer review, re-use, and long-term archiving.

Our objectives are to:
1. Develop peer-reviewed training modules for biological data science that address identified skill gaps in health and bioscience researchers (Statistics; Open science, FAIR principles & data management; Data science computing with workflows) using The Carpentries platform.
2. Deliver 98 days of remote training of our new workshops, plus established introductory material, to a diverse community of researchers in academia and industry across all career stages.
3. Offer a clinic after every workshop for learners to ask advice regarding their own projects.
4. Train 30 new instructors to deliver these workshops, building a scalable training community.

Our remotely delivered training programme will be piloted locally and made available nationally, publicized via our strong links to organizations involved in data science UK-wide. Our team will bring together subject-matter expertise in 'omics, statistics, and computation, with strengths in research data management and the UK's largest Carpentries chapter, to build an extensive cohort of confident practitioners and a scalable and sustainable network of health and bioscience data science training for the UK.

Publications

10 25 50
 
Description Article on data science training programmes for national website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact An article was written for the Times Higher Education "The Campus" website by co-investigators Alison Meynert and Edward Wallace. It describes our experience and suggestions for planning a data science skills training programme including community developed workshops.
Year(s) Of Engagement Activity 2022
URL https://www.timeshighereducation.com/campus/core-data-science-skills-filling-gaps-community-develope...
 
Description Poster at Edinburgh Learning and Teaching Conference 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact A poster was presented to introduce the Carpentries teaching model used in the Ed-DaSH programme to the University of Edinburgh teaching community. The audience was primarily lecturers and other teaching professionals.
Year(s) Of Engagement Activity 2021
URL https://blogs.ed.ac.uk/learning-teaching-conference/poster-ed-dash-a-new-innovative-programme-for-da...
 
Description Twitter social media feed for Ed-DaSH publicity 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact The Twitter handle @EdDaSH_Training was created to publicize the workshops delivered by the Ed-DaSH training programme. From launching on 27 September 2021 it has acquired 117 followers, including multiple organizations who have amplified social media posts advertising our workshops. The Twitter handle has earned on average 129 impressions per day from launch until 14 March 2022, a total of 22K impressions over a 169 day period, with a 2.2% engagement ratio. The impact was 268 link clicks through to our programme registration webpages.
Year(s) Of Engagement Activity 2021,2022
URL https://twitter.com/EdDaSH_Training
 
Description Workshop - FAIR in Biological Practice 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 19-22 October 2021
15-18 February 2022
14-17 June 2022
22-25 November 2022
A new Carpentries style workshop on the topic of FAIR principles in data management was developed by the Ed-DaSH team and remotely delivered four times. Overall feedback was positive.
Year(s) Of Engagement Activity 2021,2022
URL https://carpentries-incubator.github.io/fair-bio-practice/
 
Description Workshop - Good enough practices for scientific computing 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact A half day workshop based on Wilson et al (https://doi.org/10.1371/journal.pcbi.1005510) was developed and taught 10 times by different course leaders. It has become a standard part of introductory data science PhD graduate programme training in multiple departments at the University of Edinburgh and has been incorporated into a credit bearing course for post-graduate students in the Precision Medicine Doctoral Training Programme.
Year(s) Of Engagement Activity 2021,2022,2023
URL https://carpentries-incubator.github.io/good-enough-practices/
 
Description Workshop - Introduction to Conda 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 1 March 2022
11 May 2022
31 August 2022
An existing Carpentries workshop on the Conda package manager was expanded to include material relevant to health and bioscience research by the Ed-DaSH team. It was remotely delivered twice alongside the Unix Shell, Nextflow (once), and Snakemake (once) material in October and November 2021. Feedback from those workshops led to moving the material to be delivered as a standalone workshop in March 2022, which increased the positive feedback from learners. The workshop was delivered as a standalone twice more.
Year(s) Of Engagement Activity 2022
URL https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/
 
Description Workshop - Introduction to Machine Learning with Python 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 24-27 May 2022
23-26 August 2022
11-14 October 2022
7-10 February 2023
Ed-DaSH delivered Carpentries workshops on Introduction to Machine Learning with Python, from material development funded by the Software Sustainability Institute. Positive feedback was received.
Year(s) Of Engagement Activity 2022,2023
URL https://carpentries-incubator.github.io/machine-learning-novice-python/
 
Description Workshop - Introduction to Statistics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 20-23 September 2021
17-20 January 2022
3-6 May 2022
5-8 July 2022
20-23 September 2022
15-18 November 2022
Ed-DaSH delivered Carpentries workshops on Introduction to Statistics with R, from material developed by the University of York and funded by the Software Sustainability Institute. Positive feedback was received.
Year(s) Of Engagement Activity 2021,2022
URL https://carpentries-incubator.github.io/statistical-thinking-public-health/
 
Description Workshop - Workflows with Nextflow 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 22-25 November 2021
15-16 March 2022
31 May - 2 June 2022
4-6 October 2022
24-26 January 2023
A new Carpentries style workshop on Workflows with Nextflow was developed by the Ed-DaSH team and remotely delivered. This workshop also included an introduction to the Unix Shell (https://swcarpentry.github.io/shell-novice/) and Conda (https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/). It was delivered to 10 students as a beta teaching experience and positive feedback was received. From feedback, it was decided to drop Unix Shell from future instances of the workshop, and to deliver the Conda material separately. The Nextflow workshop was taught as a standalone over 2 days twice, and subsequently expanded to three days for a further two delivery instances.

The open source workshop material is being adapted into a course by the Oxford University Department for Continuing Education.
Year(s) Of Engagement Activity 2021,2022,2023
URL https://carpentries-incubator.github.io/workflows-nextflow/
 
Description Workshop - Workflows with Snakemake 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 11-14 October 2021
8-10 February 2022
7-9 June 2022
8-10 November 2022
A Carpentries style workshop on Workflows with Snakemake was developed by the Ed-DaSH team and delivered remotely. The October dates included teaching of Unix Shell (Software Carpentries - https://swcarpentry.github.io/shell-novice/) and Conda (https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/). Positive feedback was received. From feedback after the October workshop, it was decided to drop Unix Shell from future instances of the workshop, and to deliver the Conda material separately.
Year(s) Of Engagement Activity 2021,2022
URL https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/
 
Description Workshop materials featured in Nextflow blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Nextflow developer Evan Flodden featured the Ed-DaSH Carpentries workshop on learning Nextflow in a blog on learning materials for the workflow management system.
Year(s) Of Engagement Activity 2022
URL https://www.nextflow.io/blog/2022/learn-nextflow-in-2022.html
 
Description Workshop: High dimensional statistics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 26-29 October 2021
15-18 February 2022
17-20 May 2022
26-29 July 2022
27-30 September 2022
17-20 January 2023
A new Carpentries style workshop on High Dimensional Statistics in R was developed by the Ed-DaSH team and delivered remotely. Positive feedback was received.
Year(s) Of Engagement Activity 2021,2022,2023
URL https://carpentries-incubator.github.io/high-dimensional-stats-r/