Respondent-driven sampling: empirical evaluation and methodological development

Lead Research Organisation: London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health


A new method of gathering data on vulnerable groups in society has been proposed, called Respondent Driven Sampling.

Research is needed to assess how well Respondent Driven Sampling works in collecting high-quality data, and also to explore if the method could be improved.

This research will test whether Respondent Driven Sampling works in London university students and fishermen in Uganda. It will also evaluate and extend the theory on which Respondent Driven Sampling is based using state-of-the-art statistical inference and computational modelling

If Respondent Driven Sampling can be shown to work well, it will allow researchers and public heath workers to monitor trends in highly-vulnerable groups such as sex-workers and injecting drug users. Data that could be used, for example, to see whether HIV prevention programs in these groups are succeeding or failing.

Technical Summary

Respondent Driven Sampling(RDS) has recently been proposed as a method of collecting unbiased representative data on Hidden or hard-to-reach population subgroups that are often key to the maintenance of infectious diseases in human populations. RDS estimation theory is based on a non-standard, indirect, procedure that first makes inferences about the sample population‘s social network and then makes inferences about the target population.

This non-standard approach has hindered the methodological development and validation of RDS and, to date, only one study has attempted to evaluate RDS by comparing data collected using RDS with representative contemporaneous data from the same population. This study, an internet-based RDS survey of Cornell university students, failed to estimate the proportion male within the method‘s 95% plausible range, reinforcing earlier concerns that RDS may not generate representative data, even in easily-accessible populations. Despite this, RDS has quickly become popular, over 123 RDS studies have been published and RDS is also being employed to provide data for public health decision making by major funding bodies (Centers for Disease Control and Prevention and Family Health International).

RDS has the potential to address important epidemiological issues, but the theoretical basis of RDS is poorly understood and the key statistical questions remain unanswered. There is an urgent need to empirically validate RDS and develop appropriate statistical methods to analyse data collected using RDS.

The aim of this Fellowship is to evaluate whether RDS can generate representative population-based samples and to further develop the methodology required to analyze data collected in RDS studies.

1. To validate face-to-face RDS among fishermen in rural Uganda using representative population-based data as the gold standard
Design: Compare a cross-sectional RDS sample of 600 fishermen in rural Uganda with representative population based data.
2. To validate internet-based RDS among London graduate students using whole-population institutional data as the gold standard, and to explore the impact of temporal filtering.
Design: Two cross-sectional web-based RDS surveys, each collecting data on 500 students in the same London University.
3. To evaluate and extend the theory on which RDS is based using the results of (1) and (2) and state-of-the-art statistical inference and computational modelling by:
a. Assessing the performance of the existing RDS point and interval estimators and seek to identify alternative estimators that offer improved performance.
Design: Methodological development and statistical modelling using STATA, WinBUGS, R and Matlab.
b. Use simulation network modelling to assess the characteristics of at-risk populations and RDS survey strategies for which RDS is most able to generate representative data.
Design: Network simulation modelling using C++.
4. To produce practical recommendations for researchers to conduct RDS studies in a ‘Toolkit‘; Outputs: Toolkit and documentation on custom functions in standard statistics analyis packages
5. To initiate further studies to validate RDS using representative data on hidden populations. Proposed design: Comparison of data from cross sectional face-to-face RDS studies with representative samples on the same hidden population


10 25 50