# MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS

Lead Research Organisation:
University of Dundee

Department Name: School of Computing

### Abstract

Some steps in formal reasoning may be statistical or inductive in nature.

Many attempts to formalise or exploit this inductive or statistical nature of formal reasoning are related to methods of Neuro-Symbolic Integration, Inductive Logic and Relational Statistical Learning.

The proposal is focused on one statistical/inductive aspect of automated theorem proving -- proof-pattern recognition.

Higher-order interactive theorem provers (e.g. HOL or Coq) have been successfully developed into

sophisticated environments for mechanised proofs.

Whether these provers are applied to big industrial tasks in software verification, or to formalisation

of mathematical theories, a programmer may have to tackle thousands of lemmas and theorems of variable sizes and complexities.

A proof in such languages is constructed by combining a finite number of tactics. Some proofs may yield the same pattern of tactics,

and can be fully automated, and others may require a user's intervention.

In this case, manually found proof for one problematic lemma may serve as a template for several other lemmas needing a manual proof.

At present this kind of proof-pattern recognition and recycling is done by hand, and the ML-CAP project will look into methods to automate this.

Another issue is that unsuccessful attempts of proofs --- in the trial-and-error phase of proof-search, are normally discarded once the proof is found.

Conveniently, analysis of both positive and negative examples is inherent in statistical machine learning. And ML-CAP is going to exploit this.

However, applying statistical machine-learning methods to analyse data coming from proof theory is a challenging task for several reasons.

Formulae written in formal language have a precise, rather than a statistical nature.

For example, list(nil) may be a well-formed term, while list(nol) - not; although they may have similar patterns

recognisable by machine learning methods.

Another problem that arises when merging formal logic and statistical machine-learning algorithms is related to their computational complexity.

Many essential logic algorithms are P-complete and inherently sequential (e.g., first-order unification), while neural networks and other similar devices

are based on linear algebra and perform parallel computations.

As a solution to the outlined problems, the coalgebraic approach to automated proofs

may provide the right technique of abstraction allowing to analyse proof-patterns using machine learning methods.

Firstly, coalgebraic computations lend themselves to concurrency,

and this may be the key to obtaining adequate representation

of the outlined problems.

Secondly, they are based on the idea of repeating patterns of potentially infinite computations, rather than outputs of finite computations.

These patterns may be detected by methods of statistical pattern recognition.

ML-CAP is based upon a novel method of using statistical machine learning in analysis of formal proofs.

In summary, it provides algorithms for extracting those features from automated proofs that allow to detect proof patterns

using statistical machine learning tools, such as neural networks.

As a result, neural networks can be trained to distinguish well-formed proofs from ill-formed; distinguish whether a proof belongs to a given family of proofs,

and even make accurate predictions concerning potential success of a proof-in-progress. All three tasks have serious applications in automated reasoning.

The project will aim to generalise this method and develop it into a sound general technique for automated proofs. It will result in new methods useful

for a range of researchers in different areas, such as AI, Formal Methods, Coalgebra and Cognitive Science.

Many attempts to formalise or exploit this inductive or statistical nature of formal reasoning are related to methods of Neuro-Symbolic Integration, Inductive Logic and Relational Statistical Learning.

The proposal is focused on one statistical/inductive aspect of automated theorem proving -- proof-pattern recognition.

Higher-order interactive theorem provers (e.g. HOL or Coq) have been successfully developed into

sophisticated environments for mechanised proofs.

Whether these provers are applied to big industrial tasks in software verification, or to formalisation

of mathematical theories, a programmer may have to tackle thousands of lemmas and theorems of variable sizes and complexities.

A proof in such languages is constructed by combining a finite number of tactics. Some proofs may yield the same pattern of tactics,

and can be fully automated, and others may require a user's intervention.

In this case, manually found proof for one problematic lemma may serve as a template for several other lemmas needing a manual proof.

At present this kind of proof-pattern recognition and recycling is done by hand, and the ML-CAP project will look into methods to automate this.

Another issue is that unsuccessful attempts of proofs --- in the trial-and-error phase of proof-search, are normally discarded once the proof is found.

Conveniently, analysis of both positive and negative examples is inherent in statistical machine learning. And ML-CAP is going to exploit this.

However, applying statistical machine-learning methods to analyse data coming from proof theory is a challenging task for several reasons.

Formulae written in formal language have a precise, rather than a statistical nature.

For example, list(nil) may be a well-formed term, while list(nol) - not; although they may have similar patterns

recognisable by machine learning methods.

Another problem that arises when merging formal logic and statistical machine-learning algorithms is related to their computational complexity.

Many essential logic algorithms are P-complete and inherently sequential (e.g., first-order unification), while neural networks and other similar devices

are based on linear algebra and perform parallel computations.

As a solution to the outlined problems, the coalgebraic approach to automated proofs

may provide the right technique of abstraction allowing to analyse proof-patterns using machine learning methods.

Firstly, coalgebraic computations lend themselves to concurrency,

and this may be the key to obtaining adequate representation

of the outlined problems.

Secondly, they are based on the idea of repeating patterns of potentially infinite computations, rather than outputs of finite computations.

These patterns may be detected by methods of statistical pattern recognition.

ML-CAP is based upon a novel method of using statistical machine learning in analysis of formal proofs.

In summary, it provides algorithms for extracting those features from automated proofs that allow to detect proof patterns

using statistical machine learning tools, such as neural networks.

As a result, neural networks can be trained to distinguish well-formed proofs from ill-formed; distinguish whether a proof belongs to a given family of proofs,

and even make accurate predictions concerning potential success of a proof-in-progress. All three tasks have serious applications in automated reasoning.

The project will aim to generalise this method and develop it into a sound general technique for automated proofs. It will result in new methods useful

for a range of researchers in different areas, such as AI, Formal Methods, Coalgebra and Cognitive Science.

### Planned Impact

Impact Summary.

1. Knowledge (science and technology).

The proposed research is a pilot attempt to merge two very different groups of methods -

statistical machine learning and formal methods - using a third group of coalgebraic methods as a method of abstraction and suitable representation.

The ultimate goal for the resulting technique is to advance real-life applications of ITPs,

in the aspects where decidable algorithms and methods are not possible.

The project is distinctly interdisciplinary and has an original research idea and methodology underlying it.

My discussions of the proposed idea and method with

various groups of researchers at AI4FM'11 (AI and Formal Methods community), STP'11 (Theorem Proving community),

and ITP'11 [f] (Inductive Logic and Machine Learning communities) have confirmed my belief that

my approach is indeed new and ground-breaking, (see also letters of support).

I see the EPSRC First Grant scheme as an excellent opportunity to invest efforts into proving the concept,

developing more evidence and accumulating data supporting the proposed ML-CAP method, with a view to a bigger project if the pilot studies are successful.

Another important consequence of the proposed work will be

building confidence among various communities of researchers in both AI and Formal Methods in the potential of

statistical machine-learning methods as useful tools in handling

undecidable aspects of automated reasoning.

2. Economy and Society.

The project contributes to a long-term goal of

achieving the Grand Challenge in Computing - Dependable systems evolution;

as explained in detail in the Background section.

The proposed method will help to automate those

steps in formal proofs that cannot be automated using the state-of-the-art technology available today.

The new technology will directly affect programmers working on software and hardware verification.

In the longer term, the new method will make the process of software verification by means of ITPs lighter, faster, and hence cheaper,

and thus will have serious economic impact.

Also, ML-CAP will work towards creation of new technologies, and therefore may trigger creation of new companies.

As Formal methods improve dependability and security of software, development of these technologies will ultimately have an effect on the Society in terms of the security and quality of life.

As A. Ireland states in the attached letter of support,

``The pervasive nature of software means that software dependability plays a crucial role within the

world economy, as well as the security - from maintaining national security through to protecting

personal data.''

3. People.

In terms of new skill development, the project will affect the newly established group ``Theory of Computing and AI'' at Dundee,

and e.g. my current Honours, MSc and PhD students, as it will create a more vibrant research environment for them,

as well as reinforce our links with researchers from the partner universities in the UK and abroad.

1. Knowledge (science and technology).

The proposed research is a pilot attempt to merge two very different groups of methods -

statistical machine learning and formal methods - using a third group of coalgebraic methods as a method of abstraction and suitable representation.

The ultimate goal for the resulting technique is to advance real-life applications of ITPs,

in the aspects where decidable algorithms and methods are not possible.

The project is distinctly interdisciplinary and has an original research idea and methodology underlying it.

My discussions of the proposed idea and method with

various groups of researchers at AI4FM'11 (AI and Formal Methods community), STP'11 (Theorem Proving community),

and ITP'11 [f] (Inductive Logic and Machine Learning communities) have confirmed my belief that

my approach is indeed new and ground-breaking, (see also letters of support).

I see the EPSRC First Grant scheme as an excellent opportunity to invest efforts into proving the concept,

developing more evidence and accumulating data supporting the proposed ML-CAP method, with a view to a bigger project if the pilot studies are successful.

Another important consequence of the proposed work will be

building confidence among various communities of researchers in both AI and Formal Methods in the potential of

statistical machine-learning methods as useful tools in handling

undecidable aspects of automated reasoning.

2. Economy and Society.

The project contributes to a long-term goal of

achieving the Grand Challenge in Computing - Dependable systems evolution;

as explained in detail in the Background section.

The proposed method will help to automate those

steps in formal proofs that cannot be automated using the state-of-the-art technology available today.

The new technology will directly affect programmers working on software and hardware verification.

In the longer term, the new method will make the process of software verification by means of ITPs lighter, faster, and hence cheaper,

and thus will have serious economic impact.

Also, ML-CAP will work towards creation of new technologies, and therefore may trigger creation of new companies.

As Formal methods improve dependability and security of software, development of these technologies will ultimately have an effect on the Society in terms of the security and quality of life.

As A. Ireland states in the attached letter of support,

``The pervasive nature of software means that software dependability plays a crucial role within the

world economy, as well as the security - from maintaining national security through to protecting

personal data.''

3. People.

In terms of new skill development, the project will affect the newly established group ``Theory of Computing and AI'' at Dundee,

and e.g. my current Honours, MSc and PhD students, as it will create a more vibrant research environment for them,

as well as reinforce our links with researchers from the partner universities in the UK and abroad.

### Organisations

- University of Dundee, United Kingdom (Lead Research Organisation, Project Partner)
- University of Edinburgh, United Kingdom (Collaboration, Project Partner)
- Bangladesh University of Engineering and Technology (Collaboration)
- Heriot-Watt University, United Kingdom (Collaboration, Project Partner)
- Osnabrueck University, Germany (Project Partner)
- Newcastle University, United Kingdom (Project Partner)
- University of Bath, United Kingdom (Project Partner)

## People |
## ORCID iD |

Ekaterina Komendantskaya (Principal Investigator) |

### Publications

Ekaterina Komendantskaya (Author)
(2012)

*Coalgebraic Logic Programming: implicit versus explicit resource handling*
Ekaterina Komendantskaya (Author)
(2014)

*Automated Reasoning Workshop 2013*
Ekaterina Komendantskaya (Author)
(2012)

*A Statistical Relational Learning Challenge - extracting proof strategies from exemplar proofs.*
Ekaterina Komendantskaya (Author)
(2013)

*Statistical Proof Pattern Recognition: Automated or Interactive?*
Heras J
(2013)

*Intelligent Computer Mathematics*
Heras J
(2014)

*Recycling Proof Patterns in Coq: Case Studies*in Mathematics in Computer Science
Heras J
(2014)

*ACL2(ml): Machine-Learning for ACL2*in Electronic Proceedings in Theoretical Computer Science
Heras J
(2013)

*Intelligent Computer Mathematics*
Heras J
(2013)

*Computing persistent homology within Coq/SSReflect*in ACM Transactions on Computational LogicDescription | The findings can be split into two big categories, that in turn split into subcategories, as follows: 1. Machine-learning methods in automated and interactive theorem provers. 1.1. We developed a new method of interfacing interactive theorem provers with state-of-the-art machine-learning engines, in real time. 1.2 We developed a new method to dynamically extract statistical features from interactive proofs 1.3 We developed a new method to run and post-process the results of machine-learning algorithms data-mining the computerised proof libraries 1.4 We developed a method to automatically generate and prove new lemmas/theorems on the basis of automatically discovered statistics of old proofs. 1.5 We developed a software tool ML4PG that data-mines Coq proofs, now it has become a standard package of Proof General software (Edinburgh University). 1.5 We developed a new tool - ACL2(ml), implementing the above techniques in ACL2 language, and engaged with SICSA and industries to disseminate it further via Industrial Proof of Concept grant. 2. Coalgebraic Logic Programming (CoALP) 2.1. We further improved the parallel implementation of CoALP, and formally described its parallel features in two journal papers; 2.2. We further developed guarded corecursion in CoALP; 2.3 We issued a new version of CoALP, implemented in parallel language GoLang. 2.4. the above work on CoALP gave rise to another EPSRC grant that implement CoALP in Type Inference in a range of Programming Languages |

Exploitation Route | Software and Hardware Verification in Coq and SSReflect (for ML4PG) Software and Hardware Verification in ACL2 (for ACL2(ml)) Compilers, Type inference in Programming languages (for CoALP) All software and documentation are made publicly available here: Project progress summary: http://www.computing.dundee.ac.uk/staff/katya/MLCAP-man/ ML4PG software and documentation: http://www.computing.dundee.ac.uk/staff/katya/ML4PG/ CoALP software and documentation: http://www.computing.dundee.ac.uk/staff/katya/CoALP/ Spin-off of ML4PG work was ACL2(ml) work, funded by SICSA Industrial grant http://staff.computing.dundee.ac.uk/jheras/acl2ml/ We already know of several academic users of the ML4PG system. |

Sectors | Digital/Communication/Information Technologies (including Software) |

URL | http://staff.computing.dundee.ac.uk/katya/MLCAP-man/ |

Description | The research results have been explored in SICSA Industrial grant Machine-Learning for Industrial Theorem Proving, in which the RA Jonathan Heras collaborated with two US companies: Centaur and Rockwell Collins on improving ACL2 automation. Results are available at http://www.macs.hw.ac.uk/~ek19/ACL2ml/index.html |

First Year Of Impact | 2014 |

Sector | Digital/Communication/Information Technologies (including Software) |

Impact Types | Economic |

Description | British Logic Colloquium Funding for Automated Reasoning Workshop, Dundee, April 2013 |

Amount | £360 (GBP) |

Funding ID | http://www.computing.dundee.ac.uk/staff/katya/arw13/ |

Organisation | British Logic Colloquium |

Sector | Learned Society |

Country | United Kingdom |

Start | 04/2013 |

End | 04/2013 |

Description | Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages |

Amount | £280,590 (GBP) |

Funding ID | EP/K031864/1 |

Organisation | Engineering and Physical Sciences Research Council (EPSRC) |

Sector | Academic/University |

Country | United Kingdom |

Start | 09/2013 |

End | 01/2017 |

Description | DTA PhD Studentship |

Amount | £70,000 (GBP) |

Organisation | Engineering and Physical Sciences Research Council (EPSRC) |

Sector | Academic/University |

Country | United Kingdom |

Start | 10/2014 |

End | 04/2018 |

Description | MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS |

Amount | £100,268 (GBP) |

Funding ID | EP/J014222/1 |

Organisation | Engineering and Physical Sciences Research Council (EPSRC) |

Sector | Academic/University |

Country | United Kingdom |

Start | 02/2012 |

End | 02/2014 |

Description | Practical Types |

Amount | £1,500 (GBP) |

Organisation | SICSA Scottish Informatics and Computer Science Alliance |

Sector | Academic/University |

Country | United Kingdom |

Start | 09/2013 |

End | 09/2014 |

Description | SICSA Industrial Proof of Concept grant Machine-Learning for Industrial Theorem Proving |

Amount | £10,000 (GBP) |

Organisation | SICSA Scottish Informatics and Computer Science Alliance |

Sector | Academic/University |

Country | United Kingdom |

Start | 09/2013 |

End | 04/2014 |

Description | SICSA support for Automated Reasoning Workshop, Dundee, April 2013. |

Amount | £992 (GBP) |

Funding ID | http://www.computing.dundee.ac.uk/staff/katya/arw13/ |

Organisation | SICSA Scottish Informatics and Computer Science Alliance |

Sector | Academic/University |

Country | United Kingdom |

Start | 04/2013 |

End | 04/2013 |

Title | ML4PG: Machine Learning for Proof General |

Description | Software Supporting Interactive proof development in Coq |

Type Of Material | Improvements to research infrastructure |

Year Produced | 2017 |

Provided To Others? | Yes |

Impact | Impact on researchers in Interactive Theorem Proving Community |

URL | https://github.com/ML4PG/ML4PG |

Description | Machine Learning for theorem proving |

Organisation | Heriot-Watt University |

Department | School of Engineering & Physical Sciences |

Country | United Kingdom |

Sector | Academic/University |

PI Contribution | Research collaboration, paper writing |

Collaborator Contribution | Research collaboration, paper writing |

Impact | Research paper in workshop UITP'13: Ekaterina Komendantskaya, Jónathan Heras, Gudmund Grov: Machine Learning in Proof General: Interfacing Interfaces. UITP 2013: 15-41 |

Start Year | 2012 |

Description | Machine-Learning for theorem Proving in Haskell |

Organisation | Bangladesh University of Engineering and Technology |

Department | Department of Computer Science and Engineering |

Country | India |

Sector | Academic/University |

PI Contribution | Research discussions, paper and grant writing with Moa Johansson |

Collaborator Contribution | Research discussions, paper and grant writing |

Impact | Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406 |

Start Year | 2013 |

Description | Machine-learning in Proof Planning |

Organisation | University of Edinburgh |

Department | School of Informatics Edinburgh |

Country | United Kingdom |

Sector | Academic/University |

PI Contribution | Research discussions and paper writing with A.Bundy and E.Maclean |

Collaborator Contribution | Research discussions and paper writing |

Impact | Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406 |

Start Year | 2013 |

Title | ACL2(ml) |

Description | ACL2(ml) is an extension for Emacs that works with the automated theorem prover ACL2; and finds families of related ACL2 definitions and proofs on the basis of shape of terms. |

Type Of Technology | Software |

Year Produced | 2013 |

Open Source License? | Yes |

Impact | The paper has led to collaboration with Universities of Edinburgh and Chalmers (Sweden); as well as industries, via SICSA Industrial Proof of Concept grant. |

URL | http://staff.computing.dundee.ac.uk/katya/acl2ml/ |

Title | CoALP in Go: parallel and corecursive version |

Description | We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2014. |

Type Of Technology | Software |

Year Produced | 2014 |

Open Source License? | Yes |

Impact | This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2017. |

URL | http://staff.computing.dundee.ac.uk/katya/coalp/ |

Title | CoALP: Coalgebraic Logic programming in Haskell |

Description | We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2012. The implementation was developed by the research team employed by the grant between 2013 and 2016. |

Type Of Technology | Software |

Year Produced | 2015 |

Open Source License? | Yes |

Impact | This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2016. |

URL | http://staff.computing.dundee.ac.uk/katya/coalp/ |

Title | ML4PG -- version 2, 2014 |

Description | We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environments Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development. |

Type Of Technology | Software |

Year Produced | 2014 |

Open Source License? | Yes |

Impact | The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014). |

URL | http://staff.computing.dundee.ac.uk/katya/ML4PG/ |

Title | ML4PG: machine learning for proof general |

Description | We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environmnents Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development. |

Type Of Technology | Software |

Year Produced | 2012 |

Open Source License? | Yes |

Impact | The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014). |

URL | http://staff.computing.dundee.ac.uk/katya/ML4PG/ |

Description | Dagstuhl talk |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | International |

Primary Audience | Postgraduate students |

Results and Impact | I gave an invited talk at Dagstuhl Seminar ``AI meets formal methods". There were ~50 participants, including industrial participants and research students. |

Year(s) Of Engagement Activity | 2012 |

URL | http://drops.dagstuhl.de/opus/volltexte/2012/3731/pdf/dagrep_v002_i007_p001_s12271.pdf |

Description | Invited talk at ARW'18 |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | International |

Primary Audience | Postgraduate students |

Results and Impact | Invited talk "Machine learning for mining, understanding and automating computer proofs" at Automated Reasoning Workshop, Cambridge, UK, 12-13 April 2018. About 50 attendees, including PG students. |

Year(s) Of Engagement Activity | 2018 |

Description | Invited talk at Leicester |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | National |

Primary Audience | Postgraduate students |

Results and Impact | Invited talk ``Statistical Machine Learning in Interactive Theorem Proving" at Leicester University. about 20 PG and UG students attended. |

Year(s) Of Engagement Activity | 2013 |

Description | Invited talk at Newton Institute |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | International |

Primary Audience | Professional Practitioners |

Results and Impact | Invited talk "Machine Learning for Interactive Theorem Proving: Revisit, Reuse and Recycle your Proofs" at the Computer-Aided Mathematical Proof workshop, Isaac Newton Institute for Mathematical Sciences, Cambridge, 10-14 July 2017. The event had hundreds of participants. |

Year(s) Of Engagement Activity | 2017 |

Description | Invited talk at Oxford |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | National |

Primary Audience | Postgraduate students |

Results and Impact | Invited talk "Can statistical machine learning advance mechanised proof technology?" at Oxford university. About 10 research students attended. |

Year(s) Of Engagement Activity | 2013 |

Description | Invited talk on ML4PG |

Form Of Engagement Activity | A talk or presentation |

Part Of Official Scheme? | No |

Geographic Reach | Regional |

Primary Audience | Postgraduate students |

Results and Impact | Invited talk "ML4PG: Machine Learning for Proof General: interfacing interfaces" given at Universities of Edinburgh and Strathcyde. About 10-20 research students attended the talk. |

Year(s) Of Engagement Activity | 2012 |