📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Data-Driven Algorithms for Data Acquisition

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Advances in machine learning have transformed our ability to utilize data. But far less progress has been made on intelligently acquiring such data in the first place. Consequently, though data-driven approaches are now ubiquitous across science and industry, hand-crafted and heuristic approaches are typically still the norm for data acquisition itself.

My goal is to address this shortfall by developing principled quantitative methods for data acquisition. In particular, I will construct adaptive algorithms that leverage information from previous data to guide future data acquisition. The basis for doing this will be the framework of Bayesian adaptive design (BAD), which formalizes the utility of data through the information it provides, then exploits this to optimize the controllable aspects of the acquisition process.

Despite its principled foundations, BAD has not yet seen substantial uptake due to some key challenges in its deployment. Most notably, it has crippling computational bottlenecks that undermine its usage. By overcoming these with a new policy-based approach, I hope to turn BAD's potential into a reality, providing a powerful basis for intelligent data acquisition in domains as diverse as interactive surveys and virtual assistants, to laboratory experiments and psychology trials.

One area of particular focus will be active learning, wherein one iteratively selects points to label from an unlabelled pool. Here BAD has already provided some success, but I believe it is currently fundamentally misapplied. I hope to substantially improve state-of-the-art in the area through various innovations, such as targeting information gain in predictions rather than parameters, properly utilizing unlabelled data, and developing policy-based approaches. I further propose to revisit the foundations of the Bayesian neural network models often used in such settings, questioning their fundamental assumptions and developing radically new approaches.

Publications

10 25 50

publication icon
Dhillon G.S. (2024) On the Expected Size of Conformal Prediction Sets in Proceedings of Machine Learning Research

publication icon
Kossen J. (2024) IN-CONTEXT LEARNING LEARNS LABEL RELATIONSHIPS BUT IS NOT CONVENTIONAL LEARNING in 12th International Conference on Learning Representations, ICLR 2024

publication icon
Miao N. (2024) SELFCHECK: USING LLMS TO ZERO-SHOT CHECK THEIR OWN STEP-BY-STEP REASONING in 12th International Conference on Learning Representations, ICLR 2024

publication icon
Reichelt T. (2024) Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support in Proceedings of Machine Learning Research

publication icon
Smith F.B. (2024) Making Better Use of Unlabelled Data in Bayesian Active Learning in Proceedings of Machine Learning Research

 
Description Microsoft D.Phil Co-supervision 
Organisation Microsoft Research
Country Global 
Sector Private 
PI Contribution Co-supervision of the D.Phil student Freddie Bickford Smith.
Collaborator Contribution Co-supervision of the D.Phil student Freddie Bickford Smith by Adam Foster.
Impact Publication "Making better use of unlabelled data in Bayesian active learning" at AISTATS 2024.
Start Year 2024
 
Description Sanger Institute Co-Supervision 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Co-supervision of the D.Phil student Benjamin Chang.
Collaborator Contribution Co-supervision of the D.Phil student Benjamin Chang by Mo Lotfollahi.
Impact Co-supervision of D.Phil student, no publications yet.
Start Year 2024
 
Title Bayesian active learning with EPIG acquisition 
Description Code package for performing target-orientated Bayesian active learning using the EPIG acquisition strategy. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2024 
Open Source License? Yes  
Impact Code has been used by other research teams in the production of research papers external to our group. 
URL https://github.com/fbickfordsmith/epig
 
Description Meeting Minds Public Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Presented a public lecture on "Intelligent Data Acquisition" as part of the University's "Meeting Minds" lecture series. This sparked discussions with various audience members and some follow-ups from attendees who felt the talk had helped them in their line of work.
Year(s) Of Engagement Activity 2024