Learn to Discover (L2D): A Training Platform in Data Sciences and Machine Learning for Biomedicine and Health Researchers.

Lead Research Organisation: University College London
Department Name: Cell and Developmental Biology

Abstract

Digital skills need to improve to optimise competitive potential from accelerating applications of digital technology. Biological and medical science is becoming more automated to tackle bigger and more complex problems in discovery and applied health and medical sciences. The need for routine adoption of radically new ways of working and optimisation of their impact in this critical sector will only increase. We need action to re-skill the UK scientific workforce continuously since the deployment of digital skills offers very strong growth opportunities. The UK Government has committed to a technological future and to sectors that can deliver economic growth and has learnt lessons from elsewhere: 50% of all growth in the US economy over the last 50 years has come from the 5% of the workforce in STEM disciplines. Fusing the adoption of digital tech skills to the especially high-value STEM disciplines of discovery bioscience, biomedical and health sciences, where the UK already punches well above its weight, has tremendous potential for meaningful and measurable economic and social impact. However, tackling a digital skills crisis is not a trivial undertaking "Britain's chronic supply issues requires radical action. Working as a data expert requires knowing your maths, coding and computer science as well as problem solving, resilience and communication." (The Royal Society: - May 2019).

Since 2011 SysMIC (http://sysmic.ac.uk) - funded initially by the BBSRC - has taken advantage of the digital technology, internet access, and distant communication infrastructure widely available for most professionals to address half of this skills problem amongst active bioscience and health researchers. We delivered high-quality, e-learning and training in mathematical, computational and statistical methods. The Learn 2 Discover (L2D) project will combine the expertise of leading health and bioscience-facing computational and data scientists with the remote learning acumen and resources of SysMIC to solve the remaining part of the challenge. L2D we will deliver data science, machine learning and AI training in a highly accessible, flexible, modular format, suitable for a very wide range of starting expertise - including beginners - and study regimes. Our modules will draw on established real-world examples yet deliver widely applicable skills and general computational self-confidence for effective application well beyond the course. Delivery through the web offers resilience to disruptions of HE systems and leverages remote work and study competence developed across the R&D workforce in 2020 Participation of UK bioindustry stakeholders in the design of the programme will promote movement and sharing of talent between academic and commercial sectors. Collaboration and alignment of our modules with the work of UK centres of research excellence, infrastructure and resource networks will promote visibility, confidence and demand.

L2D will squarely address the digital productivity puzzle and promote knowledge exchange and its translation into impact in society and the economy and will offer opportunities for cross-sector spill over of benefits from training. Failure to respond effectively to the digital skills challenge is a major risk to business growth, innovation and broader societal development. A shortage in suitable digital skills persists in the UK labour market - research biomedical and health sciences is not an isolated case. Demand for workers with specialised data sciences and computational skills has been growing 6.5-fold faster than all other requirements. The best option is to nurture talent in the bio-, biomedical and health research sectors in situ with first class CPD of the sort proposed in L2D. A persistent digital skills wage differential makes impactful, broad digitals skills training attractive and very good value for money since over 75% of job openings at any skill level request digital skills.

Technical Summary

In bio and health sciences data complexity and volume is now outstripping the compute-intensive physical sciences. Challenges of massive datasets, needs for real-time predictive modelling and computer execution of human intelligence-level tasks are growing rapidly. To address this, research and education leaders in the applications of data science and quantitative approaches to bio-, biomedical and health research will build on an outstanding track record to create a data science, ML and AI e-learning and training platform called Learn 2 Discover (L2D) to meet the needs of their wider peer community. L2D will develop specific, highly relevant skills for participant's immediate needs. Insights into the data challenge of other fields will narrow persistent knowledge gaps - for example, through L2D, drug developers might encounter more data aware clinicians and conversely laboratory scientists might appreciate the priorities of health practitioners and what data is gatherable.

L2D will deliver:
a) Core skills in data handling, supervised and unsupervised ML, working with complex, high dimensional data. ("live" within 4 months of start-up)
b) A portfolio of "use case" scenarios e.g. image analysis, digital pathology, neuroscience, drug development, molecular structure-function, genotype-phenotype relationships, synbio, cancer diagnostics and functional genomics using validated real-world data
c) Insight into advances in AI and Deep Learning e.g. various neural networks, reinforcement learning, autoencoders and natural language processing and more
d) Good practice in computing and data handling to address the reproducibility crisis
e) Web publishing for sharing predictors and code
f) Accessing cloud computing to spin-up flexible resource for large compute problems
g) Collaborative coding and project workshops with skilled tutors
h) Support materials and coaching for "local" trainers in universities and businesses to facilitate the uptake and adoption.

Publications

10 25 50
 
Title Core L2D - Data Handling and Machine Learning package (2021-22) 
Description An expanded package of training materials in data handling and elements of machine learning applied to biosciences was created for live e-learning following earlier successful field testing. Following an earlier successful, small-scale field test of e-learning packages with 10 participants in London, Newcastle and Edinburgh, and later participation of 50 users in May 2021, engagement was scaled to 100 users in November 2021 (to end May 2022) mostly based in London and Cambridge. This was the first use at scale of materials developed in the SysMIC programme but with a rolling programme of continual improvements in content and delivery of much high quality materials. This continuing process was achievable with new staff and the expertise captured in an expanded project team made possible by this Innovation Scholars: Data Science Training in Health and Bioscience (DaSH) initiative and award. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact These materials have been used by over 100 participants in L2D training in this reporting year. There is an enhanced level of trainee engagement, progression and completion. 
URL http://www.learntodiscover.ai
 
Title L2D GitHub Classroom Implementation - Written material, forums and portals for assignment submission. 
Description In an effort to migrate from the Moodle VLE, we have developed infrastructure to handle and support students learning, via GitHub Classroom. The platform provides an easily navigable and manageable interface for students, instructors and tutors, to manage all aspects of the course. The portal allows students to login with their GitHub profiles, and allows access to all learning materials, together with topic-specific forums, where students can interact, and post questions that we can review, and reply to. The system also generates repositories for individual students, complemented by a clear set of instructions and tutorial video, that instructs them how to submit assignments required throughout the course; such that we can easily access and manage these, grade them, and return them to the students. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact In an effort to migrate from Moodle, we have developed infrastructure to handle and support students learning, via GitHub Classroom. The platform provides an easily navigable and manageable interface for students, instructors and tutors, to manage all aspects of the course. The portal allows students to login with their GitHub profiles, and allows access to all learning materials, together with topic-specific forums, where students can interact, and post questions that we can review, and reply to. The system also generates repositories for individual students, complemented by a clear set of instructions and tutorial video, that instructs them how to submit assignments required throughout the course; such that we can easily access and manage these, grade them, and return them to the students. The most immediate impact is that the migration to GitHub Classroom has made the L2D training programme easily navigable for both students, instructors and tutors. It has also implemented an easy-to-use forum for students and instructors to post and interact, together with an easy-to-manage assignment submission system, making it easy for students to submit their work, for tutors to grade it, and return it to them. This has created a step change in usability, learner engagement, progression and learning. 
URL https://classroom.github.com/classrooms/122978764-l2d-oct2022-machine-learning
 
Title L2D Training webinars (live-streamed interactive webinars) 
Description We provide live-streamed, interactive training webinars weekly/bi-weekly, where we deliver our training materials, and extensions on it, live to remotely attending students. The webinars comprise live, step-by-step programming training, with a team of tutors on standby to help answer student queries, on an individual, one-to-one basis. The webinars also comprise interactive exercises and problems that the students work on collaboratively, in small groups. The solutions are then shared and worked through with the entire group, once the exercise has been completed. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact These activities allow us to deliver both the core training content of the course, together with the delivery of additional material that isn't included in the written materials. They also offer students the unique ability to work on exercises together, and to engage the help of an instructor or tutor directly, on a one-to-one basis, for problem solving and detailed explanations of concepts taught on the course. 
URL https://www.youtube.com/watch?v=JhnVTjgOfEg
 
Title L2D Video training materials library 
Description We filmed and edited to a very high professional standard in excess of 88 tutorial and webinar videos. We established a green screen filming studio, recording video materials specifically designed to complement the written course materials that we provide for our students. These are filmed in 4K resolution, with binaural audio, and code and figures are animated to make the videos easy to follow along with, and paced, so that students can learn, and code while watching. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact Greatly improved learner engagement and learning, giving our materials an interactive component, where students are able to watch, and code, simultaneously. Videos are archival, and condensed to reflect sub-topics within each major lesson topic, and are embedded into our Sandpaper web materials 
URL https://youtu.be/3f627wXK6z0
 
Title Online teaching materials developed using Sandpaper (Software Carpentries Workbench), hosted on GitHub. 
Description We spent time developing and transitioning our written learning and training materials, and incorporating these into Sandpaper. The lesson template / infrastructure makes use of three R packages; namely Sandpaper, Varnish and Pegboard. These packages and the materials have been tailored and designed to fit the aesthetics and teaching requirements of the L2D course, and reflect our materials in a way that is unique to our project. The materials are hosted online, on GitHub. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Increased engagement, ease of learning and improved accessibility for our students. A pivotal effort and move in improving the delivery of our course and training programme. 
URL http://learntodiscover.github.io/Basic_Python/
 
Description Data Analysis, Machine Learning and Predictive Modelling of human EEG (4-15/07/22) for University of Science and Technology, Beijing 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact An intensive training event over two consecutive days with a total of ten, 2-hour teaching sessions. Students learned the basics of Data Analysis, Machine Learning and Dynamical Systems modelling in the context of human brain activity. The primary audience was students of mathematics, engineering and computer science from various universities in Beijing along with Lecturers and Professors from other Chinese universities.

The course included theoretical concepts, programming tutorials and exercises. Students participated in live coding in Python based on L2D material from the Data Handling, Machine Learning and Network modules, creating and importing data, displaying data, and analysing data.
Every unit dealing with data was complemented by a dynamical modelling session which tried to recreate certain aspects of brain activity and do predictive simulations.

The most significant impact was making students in the theoretical and engineering domains aware of the need to engage with life science and clinical data. The course helped to introduce L2D to a relevant academic community with strong industrial ties in China.
Year(s) Of Engagement Activity 2022
URL http://www.learntodiscover.ai
 
Description Induction Event for UCL-based UKRI funded doctoral programmes 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A live event, hosted from UCL's Bloomsbury Studio on our Gower Street campus, for a number of PhD students who enrolled to take the L2D course, starting in October 2022.

Rather like our launch event in May 2020, the LIDo induction served to introduce the course, its aims and ambitions to a cohort of students who would be actively taking the course, shortly after. We also incorporated a combination of keynote speakers, videos and interactive experiments, demonstrating some of the techniques that we teach.

The most significant outcomes was increased awareness, additional generated interest and introducing the students to their instructors and the L2D team. The event also served as an induction to the course, helping to familiarise them with how the course is structured, what to expect and what they will learn over their time studying with us.
Year(s) Of Engagement Activity 2022
URL http://www.learntodiscover.ai
 
Description L2D Induction for University of Birmingham Life Science DTP/ECR 26.10.22 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A talk for a number of PhD students from a DTP interested in Training in Programming and Data Science.

The induction served to introduce the course, its aims and ambitions to a cohort of students who could register to the course. It comprised an overview, its motivation, ways of working and types of support throughout the training.

The most significant outcome was increased awareness, additional generated interest and introducing students from UK DTPs. The event also helped students with how the course is structured, what to expect and what they will learn over their time studying with us.
Year(s) Of Engagement Activity 2022
URL http://www.learntodiscover.ai
 
Description L2D Induction for University of Edinburgh, Dundee, St Andrews and Aberdeen Life Science EastBio DTP 03.10.22 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A talk for a number of PhD students from a DTP interested in Training in Programming and Data Science.

The induction served to introduce the course, its aims and ambitions to a cohort of students who could register to the course. It comprised an overview, its motivation, ways of working and types of support throughout the training.

The most significant outcome was increased awareness, additional generated interest and introducing students from UK DTPs. The event also helped students with how the course is structured, what to expect and what they will learn over their time studying with us.
Year(s) Of Engagement Activity 2022
URL http://www.learntodiscover.ai
 
Description L2D Induction for University of Leicester Life Science DTP 14.02.23 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A talk for a number of PhD students from a DTP interested in Training in Programming and Data Science.

The induction served to introduce the course, its aims and ambitions to a cohort of students who could register to the course. It comprised an overview, its motivation, ways of working and types of support throughout the training.

The most significant impact was increased awareness, additional generated interest and introducing students from UK DTPs. The event also helped students with how the course is structured, what to expect and what they will learn over their time studying with us.
Year(s) Of Engagement Activity 2023
URL http://www.learntodiscover.ai
 
Description L2D Social Media Activities and Student Blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We have constructed and begun implementing a social media strategy, revolving around posting content daily on LinkedIn. We have also enlisted several L2D students to give a student's account and perspective of their experiences while taking our courses, and these are published as blog entries on LinkedIn.

Our strategy relies on posting content daily from the L2D project, to our LinkedIn page. Content varies from extracurricular, through to programming / machine learning / data science tips and tricks. The student blog posts are run by volunteer writers, are proofread by our instructors and is a means whereby students taking the course can write engaging posts and accounts of their experiences while actively involved in our L2D training.

The most significant outcome is increased is brand awareness. Increased engagement with audiences of tangentially-related fields and sectors, with L2D. There is also increased engagement within student communities, and between individuals enrolled in the course, and alumni of our course, as well.
Year(s) Of Engagement Activity 2022
URL https://www.linkedin.com/company/76169605
 
Description Learn To Discover (L2D) Project Launch Event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A hybrid event, hosted at UCL's Bloomsbury Studio, and simultaneously live-streamed on UCL's YouTube channel, for remote attendees.

To mark the launch of the Learn To Discover project, we organised a hybrid event, hosted and streamed from UCL's Bloomsbury Studio, on our Gower Street campus. It was simultaneously streamed to EventBrite ticket holders, on UCL's YouTube channel. We hosted a day showcasing our course, future aims and ambitions, paired with interactive live experiments, and a suite of esteemed keynote speakers working within bioscience, computer science, industry and medicine.

Raised awareness of our courses in academia, research institutes, industry and amongst funders and amongst other things resulted in a number of additional enrolments for the October 2022 course cohort.
Year(s) Of Engagement Activity 2022
URL http://www.learntodiscover.ai
 
Description Scientific Working - Statistics and Mathematical Computing Course at Duale Hochschule Baden-Württemberg (Karlsruhe, Germany). (28-31/03/22) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact An intensive training event over four consecutive days, giving students the opportunity to learn the basics of Data Science and concepts of scientific working with data. The final part was an introduction to concepts of Machine Learning with demonstrations.

The face-to-face delivery included theoretical concepts, programming tutorials and exercises. Students participated in live coding in MATLAB, creating and importing data, displaying data, and analysing data.

The most significant impacts were increased awareness amongst a divers group of students in the healthcare sector aware of the need to understand Data Science in the age of digital data. For L2D this was an outstanding opportunity to test our resources and approaches with learners outside the UK and trained in different educational systems.
Year(s) Of Engagement Activity 2023
 
Description SysMIC/L2D Data Science hybrid course. Content support webinar series 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Core content support webinars and problem-solving meetings.
Roughly thirteen sessions in each course. The following twenty-one have already been delivered over two runs of the course.
All content is available on-line via YouTube.
We estimate an audience of 51-100 above but this is for each of the sessions.
29th April 2021. SysMIC/L2D hybrid Prototype 2 - Lesson 1 Basic Python 1 - getting started. Via Zoom. 70 participants:
13th May 2021. SysMIC/L2D hybrid Prototype 2- Lesson 2 Basic Python 2 - Arrays. Via Zoom. 70 participants:
27th May 2021. SysMIC/L2D hybrid Prototype 2- Lesson 3 Iterations. Via Zoom. 70 participants:
10th June 2021. SysMIC/L2D hybrid Prototype 2- Lesson 4 Data handling 1 - Dataframes univariate. Via Zoom. 70 participants:
24th June 2021. SysMIC/L2D hybrid Prototype 2- Lesson 5 Data handling 1 - Dataframes multivariate. Via Zoom. 70 participants:
8th July 2021. SysMIC/L2D hybrid Prototype 2- Lesson 6 Data handling 1 - Image handling. Via Zoom. 70 participants:
22nd July 2021. SysMIC/L2D hybrid Prototype 2- Lesson 7 Time series. Via Zoom. 70 participants:
16th September 2021. SysMIC/L2D hybrid Prototype 2- Lesson 8 Machine learning - Classification 1. Via Zoom. 70 participants:
30th September 2021. SysMIC/L2D hybrid Prototype 2- Lesson 9 Machine learning - Classification 2. Via Zoom. 70 participants:
14th October 2021. SysMIC/L2D hybrid Prototype 2- Lesson 10 Machine learning - Clustering 1. Via Zoom. 70 participants:
28th October 2021. SysMIC/L2D hybrid Prototype 2- Lesson 10 Machine learning - Clustering 2. Via Zoom. 70 participants:
11th November 2021. SysMIC/L2D hybrid Prototype 2- Lesson 11 Machine learning - Dimensionality reduction. Via Zoom. 70 participants:
2nd December 2021. SysMIC/L2D hybrid Prototype 2- Lesson 11 Machine learning - Dimensionality reduction. Via Zoom. 70 participants:
7th October 2021. SysMIC/L2D hybrid - Lesson 1 Introduction to Python. Via Zoom. 70 participants
14th October 2021. SysMIC/L2D hybrid - Lesson 2 Dataframes 1. Via Zoom. 70 participants
4th November 2021. SysMIC/L2D hybrid - Lesson 3 For Loop and Iteration. Via Zoom. 70 participants
2nd December 2021. SysMIC/L2D hybrid - Lesson 4 Dataframes 2. Via Zoom. 70 participants
20th January 2022. SysMIC/L2D hybrid - Lesson 5 Introduction to Networks. Via Zoom. 70 participants
27th January 2022. SysMIC/L2D hybrid - Lesson 6 Network quantification. Via Zoom. 70 participants
10th February 2022. SysMIC/L2D hybrid - Lesson 7 Network applications. Via Zoom. 70 participants
10th March 2022. SysMIC/L2D hybrid - Lesson 8 Classification introduction. Via Zoom. 70 participants
Year(s) Of Engagement Activity 2021,2022
URL http://www.learntodiscover.ai
 
Description UCL DaSH Projects Workshop (14/03/23) - IDEAS and L2D 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact An intensive training day, as part of a collaborative 4-day intensive school, giving students the opportunity to learn how to perform a series of analyses, and how to handle real-world micrograph image data, derived from a human cancer database. Students participated in live coding with our team of instructors, asked questions and worked alongside each other.

This provided students with a densely-packed day of activities, training, live coding with our instructors, and the opportunity to collaboratively learn techniques in Python programming and machine/deep learning, to analyse real-world scientific data; all within a collaborative, open, physical space, as an alternative to learning these techniques, remotely.

The most significant impacts and outcomes were further generation of awareness of our course and brand, as well as interest in the L2D course for prospective students who were not previously aware of the training that we offer. The event also allowed attendees to network with each other, and our team.
Year(s) Of Engagement Activity 2023
URL https://github-pages.ucl.ac.uk/2023-03-09-UCL-DasH/
 
Description UCL DaSH Projects Workshop (23/09/22) - IDEAS and L2D 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact An intensive training day, as part of a collaborative 3-day intensive school, giving students the opportunity to learn how to perform a series of analyses, and how to handle real-world micrograph image data, derived from a human cancer database. Students participated in live coding with our team of instructors, asked questions and worked alongside each other.

This provided students with a densely-packed day of activities, training, live coding with our instructors, and the opportunity to collaboratively learn techniques in Python programming and machine/deep learning, to analyse real-world scientific data; all within a collaborative, open, physical space, as an alternative to learning these techniques, remotely.

The most significant outcome and impact was generating awareness of our course and brand, as well as interest in the L2D course for prospective students who were not previously aware of the training that we offer. The event also allowed attendees to network with each other, and our team.
Year(s) Of Engagement Activity 2022
URL http://github-pages.ucl.ac.uk/2022-09-19-UCL-DaSH/