# Dirichlet forms and Markov Chain Monte Carlo

Lead Research Organisation:
University of Warwick

Department Name: Statistics

### Abstract

Consider the problem of measuring the amount of material in a heap of sand. In effect one has to calculate the total volume lying under a surface extending over a given region. If the surface is given by a formula then it is often feasible to calculate the volume using integral calculus. However, the calculations may be demanding. Moreover, there may be no formula; it may only be possible to determine the heights experimentally at specified locations. In that case numerical approximations can be made, but this may be tricky if the surface is very irregular. This issue becomes even more problematic for analogous problems in higher dimensions.

In practice it is often preferable to employ a statistical method: one measures the heights at randomly chosen points, drawn uniformly from the region, and then averages the resulting measurements. Multiplied by the area of the region, this gives a statistical estimate of the volume, the accuracy of which increases as one samples more randomly chosen points. This method works well if the region in question is relatively simple, and is colourfully known as the Monte Carlo method.

Even the task of drawing random points from the region can be challenging; and may be overwhelmingly difficult for important higher dimensional analogues. So instead one designs a random sequence of points which, in the long run, are individually uniformly distributed. A simple example is to propose each point in turn as a random displacement from the previous point (using the normal distribution), rejecting the proposal if it falls outside the region. The first point is chosen arbitrarily, so the first few locations are biased, but this goes away in the long run. The average of measurements, multiplied by the area of the region, now yields a statistical estimate of the volume whose accuracy improves as more points are used. There are many variations on this idea, often using more general random processes such as suitable Markov chains, but the above illustrates the important ideas.

This technique is known as Markov chain Monte Carlo (MCMC), and is now used in a wide variety of different contexts, for example: statistical physics (the original context in which MCMC arose), randomised computer algorithms, study of bio-molecules, numerical integration, and (the focus of this project) applied statistics. In each case a different set of concerns lead to different emphases and different approaches.

In applied statistics one deals with probabilities of events rather then volumes of material, and with calculating long run means of different functions relating to probability distribution conditional on observations. However in essence the problem is as above. Issues include: "burn-in" (how long must a long-run average be, to avoid being biased by the choice of the first point) and "fast mixing" (how quickly does the sequence of sample points move around the sample space once equilibrium is attained). The project focuses on the question of fast mixing, and the question of how to choose the right kind of jump proposal to make mixing fast, and specifically the question of the best scale for the distribution of proposed displacements.

In 1997 Roberts et al produced theory giving clear practical advice on tuning the scale: in high dimensional situations approximately only a quarter of the proposals should be accepted. Much subsequent work has explored generalisations. However the method of proof, using stochastic differential equation (SDE) theory, has limitations. In a nutshell, it uses limits of random walks rapidly making small jumps, leading to severe smoothness restrictions. Recently it has been proposed to use "Dirichlet forms", based on averages of squares of jumps, thus not so restrictive. Additionally, the SDE approach works with a single aspect of the target probability distribution, while Dirichlet forms work with everything at once. The project will develop the Dirichlet form approach.

In practice it is often preferable to employ a statistical method: one measures the heights at randomly chosen points, drawn uniformly from the region, and then averages the resulting measurements. Multiplied by the area of the region, this gives a statistical estimate of the volume, the accuracy of which increases as one samples more randomly chosen points. This method works well if the region in question is relatively simple, and is colourfully known as the Monte Carlo method.

Even the task of drawing random points from the region can be challenging; and may be overwhelmingly difficult for important higher dimensional analogues. So instead one designs a random sequence of points which, in the long run, are individually uniformly distributed. A simple example is to propose each point in turn as a random displacement from the previous point (using the normal distribution), rejecting the proposal if it falls outside the region. The first point is chosen arbitrarily, so the first few locations are biased, but this goes away in the long run. The average of measurements, multiplied by the area of the region, now yields a statistical estimate of the volume whose accuracy improves as more points are used. There are many variations on this idea, often using more general random processes such as suitable Markov chains, but the above illustrates the important ideas.

This technique is known as Markov chain Monte Carlo (MCMC), and is now used in a wide variety of different contexts, for example: statistical physics (the original context in which MCMC arose), randomised computer algorithms, study of bio-molecules, numerical integration, and (the focus of this project) applied statistics. In each case a different set of concerns lead to different emphases and different approaches.

In applied statistics one deals with probabilities of events rather then volumes of material, and with calculating long run means of different functions relating to probability distribution conditional on observations. However in essence the problem is as above. Issues include: "burn-in" (how long must a long-run average be, to avoid being biased by the choice of the first point) and "fast mixing" (how quickly does the sequence of sample points move around the sample space once equilibrium is attained). The project focuses on the question of fast mixing, and the question of how to choose the right kind of jump proposal to make mixing fast, and specifically the question of the best scale for the distribution of proposed displacements.

In 1997 Roberts et al produced theory giving clear practical advice on tuning the scale: in high dimensional situations approximately only a quarter of the proposals should be accepted. Much subsequent work has explored generalisations. However the method of proof, using stochastic differential equation (SDE) theory, has limitations. In a nutshell, it uses limits of random walks rapidly making small jumps, leading to severe smoothness restrictions. Recently it has been proposed to use "Dirichlet forms", based on averages of squares of jumps, thus not so restrictive. Additionally, the SDE approach works with a single aspect of the target probability distribution, while Dirichlet forms work with everything at once. The project will develop the Dirichlet form approach.

### Planned Impact

The primary impact of this project will be on applied statistics. Markov chain Monte Carlo (MCMC) is an important and pervasive technique in modern applied statistics, allowing flexible statistical analyses of data arising from complicated situations. Theory offers very useful advice on design of MCMC algorithms which helps users tune the algorithms to produce results at an optimal rate (hence the theory is known as "optimal scaling"). The theory currently largely concerns toy models with rather heavy regularity conditions, though empirical evidence is that the recommendations have wider validity. This project will extend the range of the theory accordingly, using methods from mathematical analysis (specifically, Dirichlet forms and Mosco convergence).

Results from this work will extend the range of effective theoretical advice on optimal scaling. This will be a real assistance for applied statisticians seeking to build and apply effective MCMC algorithms, and hence will facilitate effective use of these algorithms throughout science and engineering. This impact will be international in scale, helping to maintain and to extend the UK's very strong reputation for innovation in this area of science.

Further impact in mathematical science will result from establishing stronger links between mathematical analysis and applied statistics via applied probability.

The project will also deliver significant contributions in terms of training of skilled personnel, both in training the RA to be an international research expert in an important area of application of mathematical analysis and probability to applied statistics, and in associated PhD projects in probability supervised by the PI. As noted in the 2011 International Review of Mathematics, probability is influential throughout modern science and technology, and the supply of capable researchers is limited. Thus these training opportunities will result in high national impact.

Results from this work will extend the range of effective theoretical advice on optimal scaling. This will be a real assistance for applied statisticians seeking to build and apply effective MCMC algorithms, and hence will facilitate effective use of these algorithms throughout science and engineering. This impact will be international in scale, helping to maintain and to extend the UK's very strong reputation for innovation in this area of science.

Further impact in mathematical science will result from establishing stronger links between mathematical analysis and applied statistics via applied probability.

The project will also deliver significant contributions in terms of training of skilled personnel, both in training the RA to be an international research expert in an important area of application of mathematical analysis and probability to applied statistics, and in associated PhD projects in probability supervised by the PI. As noted in the 2011 International Review of Mathematics, probability is influential throughout modern science and technology, and the supply of capable researchers is limited. Thus these training opportunities will result in high national impact.

### Publications

Vogrinc Jure
(2019)

*Counterexamples for optimal scaling of Metropolis-Hastings chains with rough target densities*in arXiv e-prints
Vogrinc J
(2021)

*Counterexamples for optimal scaling of Metropolis-Hastings chains with rough target densities*in The Annals of Applied Probability
Moriarty J
(2021)

*A Metropolis-class sampler for targets with non-convex support*in Statistics and Computing
Mijatovic A
(2019)

*Asymptotic variance for random walk Metropolis chains in high dimensions: logarithmic growth via the Poisson equation*in Advances in Applied Probability
Kendall Wilfrid Stephen
(2019)

*Rayleigh Random Flights on the Poisson line SIRSN*in arXiv e-prints
Kendall W
(2020)

*Rayleigh Random Flights on the Poisson line SIRSN*in Electronic Journal of ProbabilityDescription | My paper with Vogrinc, "Counterexamples for optimal scaling of Metropolis-Hastings chains with rough target densities" has now been accepted for publication in Annals of Applied Probability, the top world journal for applied probability. It shows something unexpected. It was previously known that the random algorithms known as "Random Walk Metropolis" (RWM) and "Metropolis-adjusted Langevin" (MALA) would behave rather well in high-dimensional situations; under some rather strict regularity conditions they exhibited predictable behaviour ("diffusion scaling") which made it easy to tune them for optimal performance. We decided to study the behaviour of these algorithms when the regularity conditions were far from being satisfied (namely, the target densities being quantifiably rough rather than smooth). It turned out that suitable examples still exhibit scaling (to be precise, "expected square jump scaling") albeit with a quantifiable anomalous scaling index. Consequently optimal performance tuning is more widely available than had previously been suspected: this promises to be an important consideration when using these random algorithms in modern and demanding circumstances such as arise in data science and genetics. We are now investigating the extent to which this anomalous scaling is more broadly prevalent. |

Exploitation Route | COVID restrictions permitting, we plan to draw attention to these results to practitioners in the workshop planned for the end of the third year of the grant. |

Sectors | Pharmaceuticals and Medical Biotechnology,Other |

URL | https://arxiv.org/abs/1910.09485 |

Description | Participation in enrichment weekend for sixth-formers from a local comprehensive school |

Form Of Engagement Activity | Participation in an activity, workshop or similar |

Part Of Official Scheme? | No |

Geographic Reach | Local |

Primary Audience | Schools |

Results and Impact | I was invited to join a residential activity weekend in the Lake District for around 50 sixth-formers from Finham Park Comprehensive School, Coventry. I presented two talks introducing my research at an appropriate level, and als (at the students' request) gave an impromptu seminar in statistics for A-level mathematics. My participation was warmly appreciated by the school: see https://www.facebook.com/FinhamPark1/photos/a.1980574728830941/2247306805491064/?type=3&theater. The school is keen for me to participate in future such weekends: unfortunately a clash of commitments prevented my participation this year. |

Year(s) Of Engagement Activity | 2019 |

URL | https://www.facebook.com/FinhamPark1/photos/a.1980574728830941/2247306805491064/?type=3&theater |