Dopamine-induced hippocampal plasticity: A synaptic model of foraging in mice

Lead Research Organisation: University of Cambridge
Department Name: Physiology Development and Neuroscience

Abstract

Memory is important for normal function in society and for our identity as human beings. Many factors influence how well we remember. For example, intuitively, receiving a reward helps you to learn. Receiving a reward after you have learnt can also help you remember. But how is it that later events can help strengthen memories for what happened earlier? This is the conundrum we will address in this proposal; we will study the mechanisms for how reward influences memory of preceding experience.

We can investigate this in mice by studying how they learn the location of a reward and then use their memory to navigate back to that same place at a later time. To this end, we will use computer models to predict how mice respond to rewards, and how they find rewarded locations. Then, we will test these predictions in real mice. We will give the mice a food reward at specific locations in a simple maze and then study how they navigate to these rewarded locations. If our computer prediction is correct, the mice will use a part of the brain known as the hippocampus to mark the rewarded locations on a mental map of the maze and then use this map to navigate back to this place. To test this idea, we will use a new technique known as optogenetics to activate or silence part of the mouse brain using a laser. We will then test whether the hippocampus is necessary for the animals to find the rewards, and whether a reward signal needs to enter the hippocampus to mark the reward locations on the mental map. The results should give us new insights into how memory works and how animals navigate using their memory of rewarded locations.

Technical Summary

The overall aim of this proposal is to investigate possible behavioural implications of the retroactive modulation of hippocampal plasticity that we recently discovered. The objective is to incorporate this novel learning rule in a computer model of the hippocampal network, in order to predict the behaviour of mice in a simple navigation task, and then test the predictions during equivalent learning tasks in behaving animals. Specifically, we will design a hippocampal network model consisting of 'place cells' coding for the location of the agent (artificial animal), projecting onto actor neurons determining the speed and direction of the agent. The new plasticity rule will be implemented at the synapses between place cells and actor neurons. Preliminary results suggest two fundamental advantages of the new learning rule over conventional reinforcement learning rules in such a network: 1) the agent learns from the absence of reward to enhance efficient exploration, and 2) the agent quickly 'unlearns' reward locations once reward is exhausted or absent. We will test the predictions in behaving mice using optogenetics to stimulate or silence dopaminergic reward input into the hippocampus during the task, using DAT-cre mice to express the optogenetic molecule selectively in dopaminergic neurons of the ventral tegmental area (VTA). We will also silence the plastic hippocampal CA3-CA1 synapses at different stages of task performance, using Grik4-cre mice cross-bred with Ai35 mice, to enable temporally-restricted optogenetic silencing of CA3 input. Finally, we would directly monitor synaptic weights at CA3-to-CA1 synaptic connections using extracellular multi-site recording of optogenetically-evoked field EPSPs during the task. The results of this study would indicate whether the computationally attractive properties of the novel learning rule discovered in a brain slice preparation operate in the intact mouse brain.

Planned Impact

We have identified three areas of possible impact:
1. High-tech industry. The results of our studies are likely to be of significant interest for machine learning. The algorithms used by the brain will be important for biologically inspired reinforcement learning algorithms, and we will collaborate with industry partners at Google DeepMind to explore these algorithms for commercial use. Moreover, algorithms capable of seeking novelty and navigating towards rewarding locations would be useful in robotics software to enable robots and drones to search more efficiently through an environment. We will contact investigators in robotics to explore possible interactions.
2. Education. Better understanding of the mechanisms of reward-based learning could have implications for theories of education, and we will discuss with researchers in the neuroscience of education to explore potential practical applications in education.
3. Brain disorder. Better understanding of reward processing and how it influences memory could have consequences for treatment of normal age-related memory decline as well as memory disorder such as dementia. Moreover, understanding not only how the brain learns, but also unlearns behaviours that are no longer appropriate, may be of interest for several types of disorder, including OCD, drug addiction and PTSD. We will share our insights with clinicians to enable them to utilise new basic understanding for treating patients.

Who will benefit from the research?
In addition to academic beneficiaries, high-tech industry, the education sector, and parts of the pharmaceutical industry working to develop effective drug therapies for neurological diseases will benefit from the proposed work. In addition, school children, teachers and the general public will benefit from an increase in general knowledge about synaptic plasticity. Therefore, there is the potential for beneficial impact on both health and wealth of the UK.

How will they benefit from this research?
Enabling us to adapt to our environment is one of the most fundamental functions of the brain, a process that synaptic plasticity is believed to underpin. Since memory is so integral to all our lives, gaining knowledge about the mechanisms of synaptic plasticity is of interest not only to the academic community but also to the wider public.
High-tech industry: Our work could benefit developers of smart technologies, since the learning rule explored in this project might well inspire new machine learning algorithms, new learning rules for intelligent robotics and new implementations in neuromorphic engineering. We will make use of CC's contacts at Google DeepMind to encourage the use of our model of plasticity in novel machine learning techniques.
Educational impact: Basic understanding of learning and memory processes may lead to improvements in educational methods. We will contact educational neuroscientists to explore. Moreover, we will train a new generation of scientists, some of whom will go on to do research, but others will be equipped with a solid skillset for working in pharmaceutical, biotechnological, or engineering companies using machine learning or robotics, and also banks and insurance companies that use artificial neural network techniques.
Pharmaceutical industry: Research into numerous neurological diseases, including learning disabilities, autism and Alzheimer's disease, has found that changes in synaptic plasticity may contribute to disease symptoms. The social impact and economic costs of these diseases are enormous. Therefore our work might in the longer-term benefit society from better understanding of the mechanisms that underlie such diseases, and could benefit the economy both in terms of costs saved in care for patients suffering from these conditions, and in benefits from drugs developed and sold by UK-based companies. We acknowledge that these indirect benefits may take decades before they are realised.

Publications

10 25 50
 
Description We reported that cholinergic neuromodulation in the hippocampus biases spike timing-dependent plasticity (STDP) towards depression, which could be converted to potentiation by subsequent dopaminergic modulation. Building on these new plasticity rules reported in Brzosko et al., eLife 2017, we made four main advances:
1. Using acetylcholine as an environmental exploration signal and dopamine as a reward signal, we developed a computational model of spatial navigation towards reward using the above plasticity rules and confirmed that acetylcholine, by allowing learning from negative outcomes, enhances exploration over the action space. Using sequential neuromodulation, the model predicted flexible reversal learning, surpassing the performance of other reward-modulated plasticity rules.
2. We tested the model's prediction on reversal learning in mice by optogenetically inactivating cholinergic neurons during a spatial learning task with changing rewards. We found that reversal learning, but not initial place learning, was impaired, verifying our computational prediction that acetylcholine-modulated plasticity promotes the unlearning of old reward locations. Furthermore, differences in neuromodulator concentrations in the model captured mouse-by-mouse performance variability in the optogenetic experiments.
3. Conversely, we tested the importance of timing of the cholinergic signal by optogenetic stimulation of septal cholinergic neurons. Stimulation of cholinergic neurons impaired memory formation when activated at goal location but not during navigation, underscoring the importance of appropriate timing of cholinergic input in memory formation. Reward increases sharp wave-ripple incidence, and we showed that cholinergic stimulation inhibited sharp wave ripple generation, indicating a likely mechanism for the impairment.
4. The model also predicted that different neuronal populations should be responsible for changing reward locations rather than specific reward coding cells. Behavioural experiments showed that, indeed, the fraction of active place cells increased in anticipation of reward, but the pool of active cells changed with the reward location.
Altogether, the computational model correctly predicted the outcome of the behavioural experiments.
Exploitation Route These experiments could be taken forwards by combining experiments at both behavioural and cellular levels with computational modelling. This approach formed the basis for two BBSRC Research Grant applications; first, one in which we proposed to investigate the circuit mechanisms of how animals choose between two reward locations, and second, we proposed to identify the nature of the synaptic eligibility trace and how it is converted into synaptic potentiation. Unfortunately, both proposals were rejected.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare