MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS

Lead Research Organisation: University of Dundee
Department Name: School of Computing

Abstract

Some steps in formal reasoning may be statistical or inductive in nature.
Many attempts to formalise or exploit this inductive or statistical nature of formal reasoning are related to methods of Neuro-Symbolic Integration, Inductive Logic and Relational Statistical Learning.
The proposal is focused on one statistical/inductive aspect of automated theorem proving -- proof-pattern recognition.

Higher-order interactive theorem provers (e.g. HOL or Coq) have been successfully developed into
sophisticated environments for mechanised proofs.
Whether these provers are applied to big industrial tasks in software verification, or to formalisation
of mathematical theories, a programmer may have to tackle thousands of lemmas and theorems of variable sizes and complexities.
A proof in such languages is constructed by combining a finite number of tactics. Some proofs may yield the same pattern of tactics,
and can be fully automated, and others may require a user's intervention.
In this case, manually found proof for one problematic lemma may serve as a template for several other lemmas needing a manual proof.
At present this kind of proof-pattern recognition and recycling is done by hand, and the ML-CAP project will look into methods to automate this.

Another issue is that unsuccessful attempts of proofs --- in the trial-and-error phase of proof-search, are normally discarded once the proof is found.
Conveniently, analysis of both positive and negative examples is inherent in statistical machine learning. And ML-CAP is going to exploit this.

However, applying statistical machine-learning methods to analyse data coming from proof theory is a challenging task for several reasons.
Formulae written in formal language have a precise, rather than a statistical nature.
For example, list(nil) may be a well-formed term, while list(nol) - not; although they may have similar patterns
recognisable by machine learning methods.

Another problem that arises when merging formal logic and statistical machine-learning algorithms is related to their computational complexity.
Many essential logic algorithms are P-complete and inherently sequential (e.g., first-order unification), while neural networks and other similar devices
are based on linear algebra and perform parallel computations.

As a solution to the outlined problems, the coalgebraic approach to automated proofs
may provide the right technique of abstraction allowing to analyse proof-patterns using machine learning methods.
Firstly, coalgebraic computations lend themselves to concurrency,
and this may be the key to obtaining adequate representation
of the outlined problems.
Secondly, they are based on the idea of repeating patterns of potentially infinite computations, rather than outputs of finite computations.
These patterns may be detected by methods of statistical pattern recognition.

ML-CAP is based upon a novel method of using statistical machine learning in analysis of formal proofs.
In summary, it provides algorithms for extracting those features from automated proofs that allow to detect proof patterns
using statistical machine learning tools, such as neural networks.
As a result, neural networks can be trained to distinguish well-formed proofs from ill-formed; distinguish whether a proof belongs to a given family of proofs,
and even make accurate predictions concerning potential success of a proof-in-progress. All three tasks have serious applications in automated reasoning.
The project will aim to generalise this method and develop it into a sound general technique for automated proofs. It will result in new methods useful
for a range of researchers in different areas, such as AI, Formal Methods, Coalgebra and Cognitive Science.

Planned Impact

Impact Summary.

1. Knowledge (science and technology).
The proposed research is a pilot attempt to merge two very different groups of methods -
statistical machine learning and formal methods - using a third group of coalgebraic methods as a method of abstraction and suitable representation.
The ultimate goal for the resulting technique is to advance real-life applications of ITPs,
in the aspects where decidable algorithms and methods are not possible.
The project is distinctly interdisciplinary and has an original research idea and methodology underlying it.
My discussions of the proposed idea and method with
various groups of researchers at AI4FM'11 (AI and Formal Methods community), STP'11 (Theorem Proving community),
and ITP'11 [f] (Inductive Logic and Machine Learning communities) have confirmed my belief that
my approach is indeed new and ground-breaking, (see also letters of support).


I see the EPSRC First Grant scheme as an excellent opportunity to invest efforts into proving the concept,
developing more evidence and accumulating data supporting the proposed ML-CAP method, with a view to a bigger project if the pilot studies are successful.
Another important consequence of the proposed work will be
building confidence among various communities of researchers in both AI and Formal Methods in the potential of
statistical machine-learning methods as useful tools in handling
undecidable aspects of automated reasoning.

2. Economy and Society.
The project contributes to a long-term goal of
achieving the Grand Challenge in Computing - Dependable systems evolution;
as explained in detail in the Background section.

The proposed method will help to automate those
steps in formal proofs that cannot be automated using the state-of-the-art technology available today.
The new technology will directly affect programmers working on software and hardware verification.
In the longer term, the new method will make the process of software verification by means of ITPs lighter, faster, and hence cheaper,
and thus will have serious economic impact.
Also, ML-CAP will work towards creation of new technologies, and therefore may trigger creation of new companies.

As Formal methods improve dependability and security of software, development of these technologies will ultimately have an effect on the Society in terms of the security and quality of life.
As A. Ireland states in the attached letter of support,
``The pervasive nature of software means that software dependability plays a crucial role within the
world economy, as well as the security - from maintaining national security through to protecting
personal data.''


3. People.
In terms of new skill development, the project will affect the newly established group ``Theory of Computing and AI'' at Dundee,
and e.g. my current Honours, MSc and PhD students, as it will create a more vibrant research environment for them,
as well as reinforce our links with researchers from the partner universities in the UK and abroad.

Publications

10 25 50

publication icon
Ekaterina Komendantskaya (Author) (2013) Statistical Proof Pattern Recognition: Automated or Interactive?

publication icon
Ekaterina Komendantskaya (Author) (2014) Automated Reasoning Workshop 2013

publication icon
Helfenstein J (2022) An approach for comparing agricultural development to societal visions. in Agronomy for sustainable development

publication icon
Heras J (2014) ACL2(ml): Machine-Learning for ACL2 in Electronic Proceedings in Theoretical Computer Science

publication icon
Heras J (2014) Recycling Proof Patterns in Coq: Case Studies in Mathematics in Computer Science

publication icon
Heras J (2013) Computing persistent homology within Coq/SSReflect in ACM Transactions on Computational Logic

publication icon
Heras J. (2014) Proof-Pattern Search in Coq/SSreflect in Proceedings of Coq'14 workshop at FLoC'14

 
Description The findings can be split into two big categories, that in turn split into subcategories, as follows:



1. Machine-learning methods in automated and interactive theorem provers.

1.1. We developed a new method of interfacing interactive theorem provers with state-of-the-art machine-learning engines, in real time.

1.2 We developed a new method to dynamically extract statistical features from interactive proofs

1.3 We developed a new method to run and post-process the results of machine-learning algorithms data-mining the computerised proof libraries

1.4 We developed a method to automatically generate and prove new lemmas/theorems on the basis of automatically discovered statistics of old proofs.

1.5 We developed a software tool ML4PG that data-mines Coq proofs, now it has become a standard package of Proof General software (Edinburgh University).

1.5 We developed a new tool - ACL2(ml), implementing the above techniques in ACL2 language, and engaged with SICSA and industries to disseminate it further via Industrial Proof of Concept grant.



2. Coalgebraic Logic Programming (CoALP)

2.1. We further improved the parallel implementation of CoALP, and formally described its parallel features in two journal papers;

2.2. We further developed guarded corecursion in CoALP;

2.3 We issued a new version of CoALP, implemented in parallel language GoLang.

2.4. the above work on CoALP gave rise to another EPSRC grant that implement CoALP in Type Inference in a range of Programming Languages
Exploitation Route Software and Hardware Verification in Coq and SSReflect (for ML4PG)

Software and Hardware Verification in ACL2 (for ACL2(ml))

Compilers, Type inference in Programming languages (for CoALP)
All software and documentation are made publicly available here:

Project progress summary:

http://www.computing.dundee.ac.uk/staff/katya/MLCAP-man/

ML4PG software and documentation:

http://www.computing.dundee.ac.uk/staff/katya/ML4PG/

CoALP software and documentation:

http://www.computing.dundee.ac.uk/staff/katya/CoALP/

Spin-off of ML4PG work was ACL2(ml) work, funded by SICSA Industrial grant

http://staff.computing.dundee.ac.uk/jheras/acl2ml/

We already know of several academic users of the ML4PG system.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://staff.computing.dundee.ac.uk/katya/MLCAP-man/
 
Description The research results have been explored in SICSA Industrial grant Machine-Learning for Industrial Theorem Proving, in which the RA Jonathan Heras collaborated with two US companies: Centaur and Rockwell Collins on improving ACL2 automation. Results are available at http://www.macs.hw.ac.uk/~ek19/ACL2ml/index.html
First Year Of Impact 2014
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description AISEC: AI Secure and Explainable by Construction
Amount £2,200,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2020 
End 08/2023
 
Description British Logic Colloquium Funding for Automated Reasoning Workshop, Dundee, April 2013
Amount £360 (GBP)
Funding ID http://www.computing.dundee.ac.uk/staff/katya/arw13/ 
Organisation British Logic Colloquium 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2013 
End 04/2013
 
Description Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages
Amount £280,590 (GBP)
Funding ID EP/K031864/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2013 
End 01/2017
 
Description Continuous Verification of Neural Networks
Amount £90,848 (GBP)
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2021
 
Description DTA PhD Studentship
Amount £70,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2014 
End 04/2018
 
Description MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS
Amount £100,268 (GBP)
Funding ID EP/J014222/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2012 
End 02/2014
 
Description Practical Types
Amount £1,500 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 09/2013 
End 09/2014
 
Description SICSA Industrial Proof of Concept grant Machine-Learning for Industrial Theorem Proving
Amount £10,000 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 09/2013 
End 04/2014
 
Description SICSA support for Automated Reasoning Workshop, Dundee, April 2013.
Amount £992 (GBP)
Funding ID http://www.computing.dundee.ac.uk/staff/katya/arw13/ 
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 04/2013 
End 04/2013
 
Description SecCon-NN: Neural Networks with Security Contracts - towards lightweight, modular security for neural networks
Amount £30,000 (GBP)
Organisation National Cyber Security Centre 
Sector Public
Country United Kingdom
Start 11/2019 
End 03/2020
 
Title ML4PG: Machine Learning for Proof General 
Description Software Supporting Interactive proof development in Coq 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact Impact on researchers in Interactive Theorem Proving Community 
URL https://github.com/ML4PG/ML4PG
 
Description Machine Learning for theorem proving 
Organisation Heriot-Watt University
Department School of Engineering & Physical Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution Research collaboration, paper writing
Collaborator Contribution Research collaboration, paper writing
Impact Research paper in workshop UITP'13: Ekaterina Komendantskaya, Jónathan Heras, Gudmund Grov: Machine Learning in Proof General: Interfacing Interfaces. UITP 2013: 15-41
Start Year 2012
 
Description Machine-Learning for theorem Proving in Haskell 
Organisation Bangladesh University of Engineering and Technology
Department Department of Computer Science and Engineering
Country India 
Sector Academic/University 
PI Contribution Research discussions, paper and grant writing with Moa Johansson
Collaborator Contribution Research discussions, paper and grant writing
Impact Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406
Start Year 2013
 
Description Machine-learning in Proof Planning 
Organisation University of Edinburgh
Department School of Informatics Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Research discussions and paper writing with A.Bundy and E.Maclean
Collaborator Contribution Research discussions and paper writing
Impact Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406
Start Year 2013
 
Title ACL2(ml) 
Description ACL2(ml) is an extension for Emacs that works with the automated theorem prover ACL2; and finds families of related ACL2 definitions and proofs on the basis of shape of terms. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact The paper has led to collaboration with Universities of Edinburgh and Chalmers (Sweden); as well as industries, via SICSA Industrial Proof of Concept grant. 
URL http://staff.computing.dundee.ac.uk/katya/acl2ml/
 
Title CoALP in Go: parallel and corecursive version 
Description We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2014. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2017. 
URL http://staff.computing.dundee.ac.uk/katya/coalp/
 
Title CoALP: Coalgebraic Logic programming in Haskell 
Description We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2012. The implementation was developed by the research team employed by the grant between 2013 and 2016. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2016. 
URL http://staff.computing.dundee.ac.uk/katya/coalp/
 
Title ML4PG -- version 2, 2014 
Description We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environments Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014). 
URL http://staff.computing.dundee.ac.uk/katya/ML4PG/
 
Title ML4PG: machine learning for proof general 
Description We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environmnents Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development. 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014). 
URL http://staff.computing.dundee.ac.uk/katya/ML4PG/
 
Description Dagstuhl talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I gave an invited talk at Dagstuhl Seminar ``AI meets formal methods". There were ~50 participants, including industrial participants and research students.
Year(s) Of Engagement Activity 2012
URL http://drops.dagstuhl.de/opus/volltexte/2012/3731/pdf/dagrep_v002_i007_p001_s12271.pdf
 
Description Invited talk at ARW'18 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited talk "Machine learning for mining, understanding and automating computer proofs" at Automated Reasoning
Workshop, Cambridge, UK, 12-13 April 2018. About 50 attendees, including PG students.
Year(s) Of Engagement Activity 2018
 
Description Invited talk at Leicester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Invited talk ``Statistical Machine Learning in Interactive Theorem Proving" at Leicester University. about 20 PG and UG students attended.
Year(s) Of Engagement Activity 2013
 
Description Invited talk at Newton Institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk "Machine Learning for Interactive Theorem Proving: Revisit, Reuse and Recycle your Proofs" at the
Computer-Aided Mathematical Proof workshop, Isaac Newton Institute for Mathematical Sciences, Cambridge,
10-14 July 2017. The event had hundreds of participants.
Year(s) Of Engagement Activity 2017
 
Description Invited talk at Oxford 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Invited talk "Can statistical machine learning advance mechanised proof technology?" at Oxford university. About 10 research students attended.
Year(s) Of Engagement Activity 2013
 
Description Invited talk on ML4PG 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Invited talk "ML4PG: Machine Learning for Proof General: interfacing interfaces" given at Universities of Edinburgh and Strathcyde. About 10-20 research students attended the talk.
Year(s) Of Engagement Activity 2012