Misclassification in binary and categorical variables: development of methods and software for epidemiology

Lead Research Organisation: University of Bristol

Department Name: Mathematics

Abstract

In most studies there is potential for some of the exposure, confounders, mediators, effect modifiers, or outcome to be measured with error (called misclassification in the case of categorical variables). To examine measurement error, subsamples may have data on the true value of the mismeasured variable (validation), a measure of the variable which only has random error (calibration), or repeat measures of the mismeasured variable (replication). Validation or calibration data are not often available, so we will focus on mitigation methods that use replication data.

Most epidemiological analyses do not attempt to assess measurement error or model its influence on conclusions. Sensitivity analyses are available, but many only apply to specific or simple scenarios. Widely used data resources such as UK Biobank and ALSPAC replicated data collection on a subset of their samples, and yet these replication data are rarely used to examine or adjust for measurement error. In addition, the replication sample is often not a random subset of the study sample. The implications of this for bias and methods to adjust for measurement error have not been examined.

The project will focus on methods to address misclassification rather than the standard classical measurement error model for continuous variables. One important application will be in modelling the causal effect of categorical variables measuring socioeconomic position (e.g., highest education level reached, area-level deprivation score in quintiles, family income in categories, social position category, etc) on health outcomes. Another area of focus will be self-reported variables (e.g., smoking status, depression score). We may also consider differential measurement error (e.g., where those with lower education levels report their smoking status differently to those with higher education levels).

The project will examine the impact of misclassification on causal analyses, using algebra and simulation as appropriate. Methods to examine and adjust for measurement error using replication data only will be compared (examining bias and power) using simulations and applied to replication data from ALSPAC and UKBB. Methods may include but are not limited to instrumental variable analyses, regression calibration, and Bayesian correction.

Even if the invited sample is randomly selected, the sample completing the repeat assessments may not be. We will use simulations to examine the impact of non-random repeated samples on correction for measurement error using the above methods. These methods will be applied to the repeated measures available within both ALSPAC and UKBB on a (non-random) subsample of respondents.

Lastly, we will develop software for sensitivity analysis in the case of measurement error and misclassification, to enable widespread uptake of the methods developed.

This project falls within the EPSRC "Statistics and applied probability" and "Software engineering" research areas. It will involve collaboration with the MRC Integrative Epidemiology Unit (Population Health Sciences, Bristol Medical School), the London School of Hygiene and Tropical Medicine, ALSPAC, and the UK Biobank.

Planned Impact

The COMPASS Centre for Doctoral Training will have the following impact.

Doctoral Students Impact.

I1. Recruit and train over 55 students and provide them with a broad and comprehensive education in contemporary Computational Statistics & Data Science, leading to the award of a PhD. The training environment will be built around a set of multilevel cohorts: a variety of group sizes, within and across year cohort activities, within and across disciplinary boundaries with internal and external partners, where statistics and computation are the common focus, but remaining sensitive to disciplinary needs. Our novel doctoral training environment will powerfully impact on students, opening their eyes to not only a range of modern technical benefits and opportunities, but on the power of team-working with people from a range of backgrounds to solve the most important problems of the day. They will learn to apply their skills to achieve impact by collaborative working with internal and external partners, such as via our Rapid Response Teams, Policy Workshops & Statistical Clinics.

I2. As well as advanced training in computational statistics and data science, our students will be impacted by exposure to, and training in, important cognate topics such as ethics, responsible innovation, equality, diversity and inclusion, policy, effective communication and dissemination, enterprise, impact and consultancy skills. It is vital for our students to understand that their training will enable them to have a powerful impact on the wider world, so, e.g., AI algorithms they develop should not be discriminatory, and statistical methodologies should be reproducible, and statistical results accurately and comprehensibly communicated to the general public and policymakers.

I3. The students will gain experience via collaborations with academic partners within the University in cognate disciplines, and a wide range of external industrial & government partners. The students will be impacted by the structured training programmes of the UK Academy of Postgraduate Training in Statistics, the Bristol Doctoral College, the Jean Golding Institute, the Alan Turing Institute and the Heilbronn Institute for Mathematical Sciences, which will be integrated into our programme.

I4. Having received an excellent training, the students will then impact powerfully on the world in their future fruitful careers, spreading excellence.

Impact on our Partners & ourselves.

I5. Direct impacts will be achieved by students engaging with, and working on projects with, our academic partners, with discipline-specific problems arising in engineering, education, medicine, economics, earth sciences, life sciences and geographical sciences, and our external partners Adarga, the Atomic Weapons Establishment, CheckRisk, EDF, GCHQ, GSK, the Office for National Statistics, Sciex, Shell UK, Trainline and the UK Space Agency. The students will demonstrate a wide range of innovation with these partners, will attract engagement from new partners, and often provide attractive future employment matches for students and partners alike.

Wider Societal Impact

I6. COMPASS will greatly benefit the UK by providing over 55 highly trained PhD graduates in an area that is known to be suffering from extreme, well-known, shortages in the people pipeline nationally. COMPASS CDT graduates will be equipped for jobs in sectors of high economic value and national priority, including data science, analytics, pharmaceuticals, security, energy, communications, government, and indeed all research labs that deal with data. Through their training, they will enable these organisations to make well-informed and statistically principled decisions that will allow them to maximise their international competitiveness and contribution to societal well-being. COMPASS will also impact positively on the wider student community, both now and sustainably into the future.

Student:

Codie Wood

Period of Study:

Oct 22 - Sep 26

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2740713

Research Topic:

Unclassified

Organisations

University of Bristol (Lead Research Organisation)

People	ORCID iD
Kate Tilling (Primary Supervisor)
Rachael Hughes (Primary Supervisor)
Codie Wood (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S023569/1			01/04/2019	30/09/2027
2740713	Studentship	EP/S023569/1	01/10/2022	18/09/2026	Codie Wood