Data-Driven Algorithms for Data Acquisition

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Advances in machine learning have transformed our ability to utilize data. But far less progress has been made on intelligently acquiring such data in the first place. Consequently, though data-driven approaches are now ubiquitous across science and industry, hand-crafted and heuristic approaches are typically still the norm for data acquisition itself.

My goal is to address this shortfall by developing principled quantitative methods for data acquisition. In particular, I will construct adaptive algorithms that leverage information from previous data to guide future data acquisition. The basis for doing this will be the framework of Bayesian adaptive design (BAD), which formalizes the utility of data through the information it provides, then exploits this to optimize the controllable aspects of the acquisition process.

Despite its principled foundations, BAD has not yet seen substantial uptake due to some key challenges in its deployment. Most notably, it has crippling computational bottlenecks that undermine its usage. By overcoming these with a new policy-based approach, I hope to turn BAD's potential into a reality, providing a powerful basis for intelligent data acquisition in domains as diverse as interactive surveys and virtual assistants, to laboratory experiments and psychology trials.

One area of particular focus will be active learning, wherein one iteratively selects points to label from an unlabelled pool. Here BAD has already provided some success, but I believe it is currently fundamentally misapplied. I hope to substantially improve state-of-the-art in the area through various innovations, such as targeting information gain in predictions rather than parameters, properly utilizing unlabelled data, and developing policy-based approaches. I further propose to revisit the foundations of the Bayesian neural network models often used in such settings, questioning their fundamental assumptions and developing radically new approaches.

Publications

10 25 50