Safety, robustness, and economic properties of machine learning

Lead Research Organisation: University of Oxford
Department Name: Computer Science


This project falls within the EPSRC Information and communication technologies (ICT) research area. It also interfaces with the Digital Economy and Engineering areas.

Under the supervision of Prof Yarin Gal and Prof Allan Dafoe, I will empirically and theoretically study safety and robustness of machine learning methods as well as the economic properties of the technology.

As machine learning algorithms are becoming more capable, the number of safety-critical environments in which they will be deployed increases. For example, a stock-trading algorithm has to conform to certain rules which are not easy to hard-code as a training signal. Therefore, a reinforcement learning algorithm could find ways to execute winning but illegal strategies that circumvent any ad-hoc objective that is meant to discourage such behavior.

On a high level, we are interested in fundamental research on robust (deep) learning methods that can be used to act on behalf of humans without costly supervision. Within well-controlled domains such as Atari, we can see that current ML techniques can scale far, indicating that they could scale quite far in more realistic domains. We are particularly interested in robustness to changing distributions, a poorly understood problem, and in alignment with human preferences, a critical step towards safe and useful AI systems. A non-robust system, deployed on a novel distribution, may badly misunderstand its situation, and thus may make harmful decisions confidently. An imperfectly aligned system may be exploited with adversarial inputs or may optimize away the correlation between its stated and intended objective.

My project will involve the application or Bayesian neural networks to such safety problems in machine learning. Prof Gal has significantly advanced the development of these networks and they can help to detect and adapt to distribution shift (robustness) or actively solicit data about human preferences (alignment).

Furthermore, we will research economic properties of machine learning as a technology. In economics, a technology is defined by its production function which relates valuable inputs to outputs. In machine learning, those inputs include data, compute, and labor. The outputs depend on the specific task. They might be measures such as test errors, but also the value of the resulting product that uses a learned model (e.g. an application or a stock trading algorithm). Estimating such a production function is one of the oldest empirical problems in economics, but has not been explicitly done for machine learning. The methodology involves microeconomic modeling which machine learning researchers have so far not used.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
2219023 Studentship EP/N509711/1 01/10/2019 30/09/2022 Soren Mindermann
EP/R513295/1 01/10/2018 30/09/2023
2219023 Studentship EP/R513295/1 01/10/2019 30/09/2022 Soren Mindermann