Machine Learning in Automated Theorem Proving

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Reasoning is an essential component of a general intelligence, but has proven difficult to fully automate. Interactive provers that assist a human expert are still more prevalent than a successful fully-automated proof search. However, in the last years fully-automated Machine Learning methods set state-of-the-art results in several reasoning challenges that used to require human supervision and handcrafted feature engineering, e.g. machine translation (Wu et al., 2016), visual question answering (Kahou et al., 2017), reasoning on algorithmic tasks (Zaremba et al., 2014), and excelling at abstract strategy games (Silver et al., 2016). Some of these tasks draw similarities to the main bottlenecks in the ATP. For instance, the choice of the winning strategy in a game as well as decisions required to prove a theorem can be similarly modelled as search problems. My goal will be to address these challenges by developing new approaches to fill the gaps in ATP using Machine Learning.

The challenges in ATP that traditionally require human supervision are posed, for instance, by selection of the most useful mathematical statements to prove a conjecture (premise selection), the choice of an optimal strategy for a proof search (heuristic selection), or semantically relevant translation of a mathematical content (e.g. expressed in LaTeX) to the formal language of a prover (auto-formalization). Modern Machine Learning methods have been recently applied to all of the problems mentioned, and yielded promising results, for instance (Alemi et al., 2017) and (Bridge et al., 2014). Based on the state-of-the-art research, I would like to explore the combination of employing Reinforcement Learning methods to guide a proof search, and powerful learning algorithms such as Deep Neural Networks to extract the most relevant features, and to restrict the search space for the decision process.

There is a potential to make a significant progress in the research on semantically accurate representations of the mathematical statements. I would like to contribute to the ATP landscape by developing new methods at the interface of Neural Networks and Natural Language Processing, including graph embeddings that are the most natural way of representing mathematical formulae. In particular, I would like to investigate the more expressive Higher-order logic, using an open-source dataset released for the evaluation of new Machine Learning-based theorem proving methods (Kaliszyk et al., 2017).

In conclusion, I strive to improve the automation of theorem proving by using the most promising advances in Machine Learning. Performing the proposed research thanks to the EPSRC studentship will allow me to contribute to the key aspect of artificial intelligence, as well as to the applied fields that employ ATP, notably to the safety-critical software and hardware design. Moreover, this research aims at showing steps towards a better understanding and development of generally applicable Machine Learning methods.

Student:

Agnieszka Slowik

Period of Study:

Jan 19 - Dec 22

Funder:

NERC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2119928

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Sean Holden (Primary Supervisor)
Agnieszka Slowik (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Danel T (2020) Neural Information Processing - 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18-22, 2020, Proceedings, Part V

Lamb A (2021) Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Slowik A (2020) Bayesian Optimisation for Premise Selection in Automated Theorem Proving (Student Abstract) in Proceedings of the AAAI Conference on Artificial Intelligence

Slowik A Bayesian Optimisation for Heuristic Configuration in Automated Theorem Proving

Slowik A (2022) Out-of-distribution generalisation in machine learning

Key Findings


Description	In the last 2 years of my funding, I developed a new dataset, a new multi-agent game and several new variants of neural networks.
Exploitation Route	The outcomes can be used in research related to graph classification (for example, automatic prediction of molecular properties), emergence of human language, improving data efficiency of neural networks used in image classification. The most relevant field is machine learning (with two short papers at the intersection of machine learning and automated theorem proving).
Sectors	Chemicals Digital/Communication/Information Technologies (including Software) Education Financial Services and Management Consultancy Healthcare Pharmaceuticals and Medical Biotechnology Other
URL	https://scholar.google.com/citations?user=BPqK1l0AAAAJ&view_op=list_works&sortby=pubdate

Abstract

Organisations

People

ORCID iD

Publications