Advancing progress towards the adoption of reinforcement learning agents in the energy industry

Lead Research Organisation: University of Bristol
Department Name: Computer Science


The drive towards sustainability in the energy industry is being addressed by the continued uptake of renewable energy sources and through the reduction of energy consumption by improving efficiency. Introducing intermittent renewables, such as wind and solar power, adds more uncertainty to the already complex, multivariate, nonlinear system. For these reasons, reinforcement learning (RL) has been proposed for the operation, control and optimisation of energy systems including microgrids, energy storage and building energy management. Through continual learning, RL controllers can adapt to changing dynamics whilst maintaining optimal operation without requiring a model of the system. This offers improvements on traditional control strategies such as model prediction control (MPC) and proportional-integral-derivative (PID) controllers.
Reinforcement learning is a mathematical formalisation for learning-based control of dynamical systems. By interacting in an environment and through a trial-and-error process, autonomous agents learn a behavioural policy to maximise cumulative reward for sequential decision-making problems. The concept of delayed reward is key where an agent chooses actions to not only gain immediate reward but also long-term gain as part of a lookahead planning strategy. However, deployment of autonomous RL agents in the real world is sparse and research is limited to domains in which environments can be simulated. In a typical RL experiment agents are trained and evaluated using expected reward but in real-world systems a number of safety constraints need to be satisfied and the optimisation criteria may also involve a trade-off of a number of sub-goals.
Traditional RL algorithms are designed to learn in an online setting through interaction in an environment (such as a simulator). However, online learning is an obstacle to the widespread adoption of RL in industry due to time consuming data collection and safety concerns. One solution is to pre-train agents in a simulator before transferring to the real world. However, the success of this method relies on the development of high-fidelity simulators. Alternately, RL agents can be pre-trained in an offline setting utilising previously collected data. By training agents on a static dataset of trajectories, offline RL (also known as batch learning) supports scalable data driven RL. Offline focuses on answering counterfactual queries, what if the agent performed a different set of actions to the observed behaviour in order to learn an alternate, more optimal policy.
Collaborating with EDF R&D Digital Innovation the aim of the PhD project is to make advancements towards the adoption and safe deployment of RL agents in the real world with a focus on contexts in the energy industry. Specific RL research areas are:
* Utilising human expert knowledge in the design of reward functions, which encapsulates the developer's objectives enabling the autonomous agent to learn. This can be extended to multi-objective reward functions and the evaluation of the inherent trade-off.
* Training RL agents in an offline setting utilising previously collected data and evaluation of safety constraints.
* Interaction with and interpretability of RL agents to gain the confidence of end users. This could be by providing policy explanations including answering counterfactual queries for policies learnt offline.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S022937/1 01/04/2019 30/09/2027
2276905 Studentship EP/S022937/1 23/09/2019 22/09/2023 Stefan Radic Webster