Rigorous Training in Longitudinal Data Science (RADIANCE)

Lead Research Organisation: University College London
Department Name: Institute of Child Health

Abstract

We live in a world where data are collected on nearly everything we do. Such information has the potential to be extremely useful if we wish to improve our health. However, doing this safely is not easy.
There are many examples where data have been misused or erroneous interpretation of the evidence has been drawn. This is increasingly apparent this year, where scientists and governments are struggling to communicate the uncertainties in their understanding of the current pandemic. Because the available evidence is limited, scientists stress that more data are needed to compare regions, subgroups of people and, crucially, study the evolution of the epidemic over time. Only with more data we will be able to understand variations in the population and explain health inequalities as well as time trends. For example, to assess if a local or national lock-down is working, we need to count how many cases arise in different communities over certain periods of time, how many of them are hospitalised, and how many die. Ideally, we should follow each individual from diagnosis, to hospitalization, to recovery or death, and then compare the incidence of each of these events, by region, sex, occupation, and ethnicity, for example. To achieve this, we need to account for when each of these events occurs. This requires linking information on the same individual over time.
The same principle applies to the study of other diseases. For this reason, access to linked individual medical and administrative records is crucial for biomedical and public health research. Having the data is not sufficient, however. They need to be: (a) safely stored, cleaned, and prepared for analysis; (b) properly analysed; and (c) interpreted together with evidence from other countries and other published research. We label these steps: data stewardship, analysis, and context. Our proposal aims to train health and social data scientists in the core skills needed to achieve these steps. We will use different formats which will all be on-line to reach the broadest community of data scientists. We will produce short introductory videos (which we call "Appetisers"), and then various on-line material delivered at an intermediate and more advanced levels. Some of this will be in the form of recorded lectures, some as live tutorials where the material covered by the lectures is reinforced with practical computer-based exercises. We will also run specific courses on specialised topics which will include live (but on-line) interactions with members of the training team, and "data clinics" where participants can have one-to-one discussions with us.

In summary, we will endeavour to develop and run an accessible and inclusive training programme for data scientists involved in the management, analysis and interpretation of complex longitudinal biosocial data.

Technical Summary

Researchers in the biomedical and social sciences have now access to incrementally larger data resources which are generated by linkage between administrative, cohort and panel databases, with many of these spanning over decades. Their longitudinal nature provides huge opportunities for describing and investigating medical and behavioural histories, as well as socio-economic changes, and to study their relationship with population health outcomes. Evidence informed by complex longitudinal biosocial data must be based on rigorous data stewardship (i.e. data linkage, manipulation, cleaning, and documentation) twinned with appropriate targets of analysis, transparent analytical plans and accurate interpretation of results. The high-and multi-dimensional nature of these newly created data resources requires skills that are often compartmentalised within different disciplines, however.

With this training programme we aim to provide a comprehensive, cohesive and rigorous portfolio targeted at the broad community of quantitative researchers involved in the management, analysis and interpretation of longitudinal biosocial data and the production of rigorous and transparent scientific evidence. The training will be framed around the three core themes of "Data Stewardship", "Analysis" and "Context".
We will use multiple delivery formats, including short videos on basic concepts ("appetizers"), modules, short courses and clinics all delivered on-line. The training will be developed and delivered by a team of methodological and applied researchers working in health and social science, each bringing their own considerable expertise in both research and education.

In summary, this training programme will enhance knowledge, self-confidence and expertise that is much in demand among researchers who want to utilize complex longitudinal biosocial data.

Publications

10 25 50
 
Description 9 introductory videos 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We produce 9 "appetizers" to the training course on the following topic's:
1- Questions, data and research methods: https://youtu.be/OBfMkzAP7fU
2-Causal Questions: https://youtu.be/N0kMRRbuPCE
3-Information Governance for users of Administrative Data: https://youtu.be/Tie4Ih5SJns
4-Trusted Research Environments: https://youtu.be/_mQWDvjAU0M
5-Data Handling: https://youtu.be/T2WE5cY4IRg
6-Ethical Considerations for data scientists: https://youtu.be/mBIYw2W6yFg
7-Reproducible and Open Data Science: https://youtu.be/ENcpbmGRfAk
8-Longitudinal Data Structures: https://youtu.be/-xctS1yNjns
9-Introduction to Missing Data: https://youtu.be/ZFDk13WrdmM
Year(s) Of Engagement Activity 2022
URL https://radiance.org.uk/training/
 
Description Analysis of electronic Health records 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Administrative data, sometimes referred to as routinely collected data, provide large and rich datasets for research. However, they require careful cleaning, management and interpretation. This online course is for those who are interested in whether they might want to use administrative data for research and would like a short introduction to this topic. The course will use administrative health data (national hospital inpatient data - the Hospital Episode Statistics database) as an example, but the principles apply to all administrative data.
Year(s) Of Engagement Activity 2022
 
Description Causal Diagrams, Jan 23 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This introductory course is for anyone wishing to learn how to graphically draw our assumptions regarding how an exposure and an outcome may be related, either causally or via common associations with other variables. Learning about how to draw such assumptions is useful to guide:

the design of observational studies aiming to investigate the causal relationship between exposure and outcome and
the analysis of such studies.
We will introduce the language of potential outcomes before describing the fundamental rules for drawing and interrogating causal diagrams.
Year(s) Of Engagement Activity 2023
 
Description Causal questions: an introduction Nov 22 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This introductory course is for anyone wishing to understand how causal questions can be investigated using real world data (RWD), that is data on the everyday experiences of individuals that are collected through surveys, cohort studies, administrative and clinical databases or accrued for reasons other than research. These data are observational, as opposed to experimental. Because of this, using them to address causal questions raises many concerns and difficulties. In this course we will describe the main sources of bias affecting RWD and possible strategies to deal with them.

The course will start by distinguishing between different types of studies (e.g., RCTs, cross-sectional and longitudinal) and data sources (e.g., research-based, administrative databases). It will then describe the sources of bias that are likely to affect observational data, in particular those arising from the non-randomized allocation of exposures (denoted confounding bias in epidemiology and selection bias in the social sciences), from missing participation (including missing data), and from measurement errors. We will then introduce two main design-based approaches to attempt dealing with (some of) these biases: the framework of target trial emulation and the exploitation of natural experiments.
Year(s) Of Engagement Activity 2022
 
Description Causal questions: an introduction March 22 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact his introductory course is for anyone wishing to understand how causal questions can be investigated using real world data (RWD), that is data on the everyday experiences of individuals that are collected through surveys, cohort studies, administrative and clinical databases or accrued for reasons other than research. These data are observational, as opposed to experimental. Because of this, using them to address causal questions raises many concerns and difficulties. In this course we will describe the main sources of bias affecting RWD and possible strategies to deal with them.
Year(s) Of Engagement Activity 2022
 
Description Longitudinal Data Preparation & Visualisation For Epidemiological And Social Research, March 23 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This online course is for anyone that needs to prepare longitudinal data for analysis. It will cover the main procedures needed from converting raw longitudinal data to cleaned data that can be readily analysed.

The course will have two sessions, one covering data preparation and the other covering data description and visualization. Both will focus on longitudinal data and real-world data. You can take the module either in R or in Stata, each will have its own videos and practical exercises.
Year(s) Of Engagement Activity 2023
 
Description Longitudinal data preparation and visualisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This online course is for anyone that needs to prepare longitudinal data for analysis. It will cover the main procedures needed from converting raw longitudinal data to cleaned data that can be readily analysed.

The course will have two sessions, one covering data preparation and the other covering data description and visualization. Both will focus on longitudinal data and real-world data. You can take the module either in R or in Stata, each will have its own videos and practical exercises.
Year(s) Of Engagement Activity 2022
 
Description Multiple Imputation of Missing data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This online course is for anyone needing to address the issue of missing information in their quantitative data. It covers the most important principles of missing data analysis and how to effectively address the issues in analyses.
Year(s) Of Engagement Activity 2022
 
Description Regression Models 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This online course gives you an overview of commonly used regression methods to examine the relationship between an outcome of interest and an explanatory variable. You will be introduced to classical linear regression and generalised linear models (e.g. logistic, Poisson, ordinal/multinomial models) depending on the distribution of the outcome. The course covers the basic concept, formulation, interpretation, and validation of the models. Real-world data will be used to demonstrate the practical applications of these models.
Year(s) Of Engagement Activity 2022
 
Description Statistics Clinic 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This was a one-to-one clinic on how to compare the population who access alcohol service with the population that should be accessing the service because they tested positive to the alcohol test, in terms of age, sex, ethnicity and IMD
Year(s) Of Engagement Activity 2022