MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS

Lead Research Organisation: University of Dundee

Department Name: School of Computing

Abstract

Some steps in formal reasoning may be statistical or inductive in nature.
Many attempts to formalise or exploit this inductive or statistical nature of formal reasoning are related to methods of Neuro-Symbolic Integration, Inductive Logic and Relational Statistical Learning.
The proposal is focused on one statistical/inductive aspect of automated theorem proving -- proof-pattern recognition.

Higher-order interactive theorem provers (e.g. HOL or Coq) have been successfully developed into
sophisticated environments for mechanised proofs.
Whether these provers are applied to big industrial tasks in software verification, or to formalisation
of mathematical theories, a programmer may have to tackle thousands of lemmas and theorems of variable sizes and complexities.
A proof in such languages is constructed by combining a finite number of tactics. Some proofs may yield the same pattern of tactics,
and can be fully automated, and others may require a user's intervention.
In this case, manually found proof for one problematic lemma may serve as a template for several other lemmas needing a manual proof.
At present this kind of proof-pattern recognition and recycling is done by hand, and the ML-CAP project will look into methods to automate this.

Another issue is that unsuccessful attempts of proofs --- in the trial-and-error phase of proof-search, are normally discarded once the proof is found.
Conveniently, analysis of both positive and negative examples is inherent in statistical machine learning. And ML-CAP is going to exploit this.

However, applying statistical machine-learning methods to analyse data coming from proof theory is a challenging task for several reasons.
Formulae written in formal language have a precise, rather than a statistical nature.
For example, list(nil) may be a well-formed term, while list(nol) - not; although they may have similar patterns
recognisable by machine learning methods.

Another problem that arises when merging formal logic and statistical machine-learning algorithms is related to their computational complexity.
Many essential logic algorithms are P-complete and inherently sequential (e.g., first-order unification), while neural networks and other similar devices
are based on linear algebra and perform parallel computations.

As a solution to the outlined problems, the coalgebraic approach to automated proofs
may provide the right technique of abstraction allowing to analyse proof-patterns using machine learning methods.
Firstly, coalgebraic computations lend themselves to concurrency,
and this may be the key to obtaining adequate representation
of the outlined problems.
Secondly, they are based on the idea of repeating patterns of potentially infinite computations, rather than outputs of finite computations.
These patterns may be detected by methods of statistical pattern recognition.

ML-CAP is based upon a novel method of using statistical machine learning in analysis of formal proofs.
In summary, it provides algorithms for extracting those features from automated proofs that allow to detect proof patterns
using statistical machine learning tools, such as neural networks.
As a result, neural networks can be trained to distinguish well-formed proofs from ill-formed; distinguish whether a proof belongs to a given family of proofs,
and even make accurate predictions concerning potential success of a proof-in-progress. All three tasks have serious applications in automated reasoning.
The project will aim to generalise this method and develop it into a sound general technique for automated proofs. It will result in new methods useful
for a range of researchers in different areas, such as AI, Formal Methods, Coalgebra and Cognitive Science.

Planned Impact

Impact Summary.

1. Knowledge (science and technology).
The proposed research is a pilot attempt to merge two very different groups of methods -
statistical machine learning and formal methods - using a third group of coalgebraic methods as a method of abstraction and suitable representation.
The ultimate goal for the resulting technique is to advance real-life applications of ITPs,
in the aspects where decidable algorithms and methods are not possible.
The project is distinctly interdisciplinary and has an original research idea and methodology underlying it.
My discussions of the proposed idea and method with
various groups of researchers at AI4FM'11 (AI and Formal Methods community), STP'11 (Theorem Proving community),
and ITP'11 [f] (Inductive Logic and Machine Learning communities) have confirmed my belief that
my approach is indeed new and ground-breaking, (see also letters of support).

I see the EPSRC First Grant scheme as an excellent opportunity to invest efforts into proving the concept,
developing more evidence and accumulating data supporting the proposed ML-CAP method, with a view to a bigger project if the pilot studies are successful.
Another important consequence of the proposed work will be
building confidence among various communities of researchers in both AI and Formal Methods in the potential of
statistical machine-learning methods as useful tools in handling
undecidable aspects of automated reasoning.

2. Economy and Society.
The project contributes to a long-term goal of
achieving the Grand Challenge in Computing - Dependable systems evolution;
as explained in detail in the Background section.

The proposed method will help to automate those
steps in formal proofs that cannot be automated using the state-of-the-art technology available today.
The new technology will directly affect programmers working on software and hardware verification.
In the longer term, the new method will make the process of software verification by means of ITPs lighter, faster, and hence cheaper,
and thus will have serious economic impact.
Also, ML-CAP will work towards creation of new technologies, and therefore may trigger creation of new companies.

As Formal methods improve dependability and security of software, development of these technologies will ultimately have an effect on the Society in terms of the security and quality of life.
As A. Ireland states in the attached letter of support,
``The pervasive nature of software means that software dependability plays a crucial role within the
world economy, as well as the security - from maintaining national security through to protecting
personal data.''

3. People.
In terms of new skill development, the project will affect the newly established group ``Theory of Computing and AI'' at Dundee,
and e.g. my current Honours, MSc and PhD students, as it will create a more vibrant research environment for them,
as well as reinforce our links with researchers from the partner universities in the UK and abroad.

Funded Value:

£100,268

Funded Period:

Mar 12 - Feb 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J014222/1

Principal Investigator:

Ekaterina Komendantskaya

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (50%)

Fundamentals of Computing (50%)

Organisations

People	ORCID iD
Ekaterina Komendantskaya (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Duncan K (2020) Relative Robustness of Quantized Neural Networks Against Adversarial Attacks

Ekaterina Komendantskaya (Author) (2012) A Statistical Relational Learning Challenge - extracting proof strategies from exemplar proofs.

Ekaterina Komendantskaya (Author) (2013) Statistical Proof Pattern Recognition: Automated or Interactive?

Ekaterina Komendantskaya (Author) (2012) Coalgebraic Logic Programming: implicit versus explicit resource handling

Ekaterina Komendantskaya (Author) (2014) Automated Reasoning Workshop 2013

Helfenstein J (2022) An approach for comparing agricultural development to societal visions. in Agronomy for sustainable development

Heras J (2014) ACL2(ml): Machine-Learning for ACL2 in Electronic Proceedings in Theoretical Computer Science

Heras J (2014) Recycling Proof Patterns in Coq: Case Studies in Mathematics in Computer Science

Heras J (2013) Computing persistent homology within Coq/SSReflect in ACM Transactions on Computational Logic

Heras J. (2014) Proof-Pattern Search in Coq/SSreflect in Proceedings of Coq'14 workshop at FLoC'14

Key Findings
Impact Summary
Further Funding
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	The findings can be split into two big categories, that in turn split into subcategories, as follows: 1. Machine-learning methods in automated and interactive theorem provers. 1.1. We developed a new method of interfacing interactive theorem provers with state-of-the-art machine-learning engines, in real time. 1.2 We developed a new method to dynamically extract statistical features from interactive proofs 1.3 We developed a new method to run and post-process the results of machine-learning algorithms data-mining the computerised proof libraries 1.4 We developed a method to automatically generate and prove new lemmas/theorems on the basis of automatically discovered statistics of old proofs. 1.5 We developed a software tool ML4PG that data-mines Coq proofs, now it has become a standard package of Proof General software (Edinburgh University). 1.5 We developed a new tool - ACL2(ml), implementing the above techniques in ACL2 language, and engaged with SICSA and industries to disseminate it further via Industrial Proof of Concept grant. 2. Coalgebraic Logic Programming (CoALP) 2.1. We further improved the parallel implementation of CoALP, and formally described its parallel features in two journal papers; 2.2. We further developed guarded corecursion in CoALP; 2.3 We issued a new version of CoALP, implemented in parallel language GoLang. 2.4. the above work on CoALP gave rise to another EPSRC grant that implement CoALP in Type Inference in a range of Programming Languages
Exploitation Route	Software and Hardware Verification in Coq and SSReflect (for ML4PG) Software and Hardware Verification in ACL2 (for ACL2(ml)) Compilers, Type inference in Programming languages (for CoALP) All software and documentation are made publicly available here: Project progress summary: http://www.computing.dundee.ac.uk/staff/katya/MLCAP-man/ ML4PG software and documentation: http://www.computing.dundee.ac.uk/staff/katya/ML4PG/ CoALP software and documentation: http://www.computing.dundee.ac.uk/staff/katya/CoALP/ Spin-off of ML4PG work was ACL2(ml) work, funded by SICSA Industrial grant http://staff.computing.dundee.ac.uk/jheras/acl2ml/ We already know of several academic users of the ML4PG system.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://staff.computing.dundee.ac.uk/katya/MLCAP-man/


Description	The research results have been explored in SICSA Industrial grant Machine-Learning for Industrial Theorem Proving, in which the RA Jonathan Heras collaborated with two US companies: Centaur and Rockwell Collins on improving ACL2 automation. Results are available at http://www.macs.hw.ac.uk/~ek19/ACL2ml/index.html
First Year Of Impact	2014
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	AISEC: AI Secure and Explainable by Construction
Amount	£2,200,000 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2020
End	08/2023


Description	British Logic Colloquium Funding for Automated Reasoning Workshop, Dundee, April 2013
Amount	£360 (GBP)
Funding ID	http://www.computing.dundee.ac.uk/staff/katya/arw13/
Organisation	British Logic Colloquium
Sector	Charity/Non Profit
Country	United Kingdom
Start	04/2013
End	04/2013


Description	Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages
Amount	£280,590 (GBP)
Funding ID	EP/K031864/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2013
End	01/2017


Description	Continuous Verification of Neural Networks
Amount	£90,848 (GBP)
Organisation	Defence Science & Technology Laboratory (DSTL)
Sector	Public
Country	United Kingdom
Start	04/2020
End	03/2021


Description	DTA PhD Studentship
Amount	£70,000 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	10/2014
End	04/2018


Description	MACHINE LEARNING COALGEBRAIC AUTOMATED PROOFS
Amount	£100,268 (GBP)
Funding ID	EP/J014222/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	02/2012
End	02/2014


Description	Practical Types
Amount	£1,500 (GBP)
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	09/2013
End	09/2014


Description	SICSA Industrial Proof of Concept grant Machine-Learning for Industrial Theorem Proving
Amount	£10,000 (GBP)
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	09/2013
End	04/2014


Description	SICSA support for Automated Reasoning Workshop, Dundee, April 2013.
Amount	£992 (GBP)
Funding ID	http://www.computing.dundee.ac.uk/staff/katya/arw13/
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	04/2013
End	04/2013


Description	SecCon-NN: Neural Networks with Security Contracts - towards lightweight, modular security for neural networks
Amount	£30,000 (GBP)
Organisation	National Cyber Security Centre
Sector	Public
Country	United Kingdom
Start	11/2019
End	03/2020


Title	ML4PG: Machine Learning for Proof General
Description	Software Supporting Interactive proof development in Coq
Type Of Material	Improvements to research infrastructure
Year Produced	2017
Provided To Others?	Yes
Impact	Impact on researchers in Interactive Theorem Proving Community
URL	https://github.com/ML4PG/ML4PG


Description	Machine Learning for theorem proving
Organisation	Heriot-Watt University
Department	School of Engineering & Physical Sciences
Country	United Kingdom
Sector	Academic/University
PI Contribution	Research collaboration, paper writing
Collaborator Contribution	Research collaboration, paper writing
Impact	Research paper in workshop UITP'13: Ekaterina Komendantskaya, Jónathan Heras, Gudmund Grov: Machine Learning in Proof General: Interfacing Interfaces. UITP 2013: 15-41
Start Year	2012


Description	Machine-Learning for theorem Proving in Haskell
Organisation	Bangladesh University of Engineering and Technology
Department	Department of Computer Science and Engineering
Country	India
Sector	Academic/University
PI Contribution	Research discussions, paper and grant writing with Moa Johansson
Collaborator Contribution	Research discussions, paper and grant writing
Impact	Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406
Start Year	2013


Description	Machine-learning in Proof Planning
Organisation	University of Edinburgh
Department	School of Informatics Edinburgh
Country	United Kingdom
Sector	Academic/University
PI Contribution	Research discussions and paper writing with A.Bundy and E.Maclean
Collaborator Contribution	Research discussions and paper writing
Impact	Jónathan Heras, Ekaterina Komendantskaya, Moa Johansson, Ewen Maclean: Proof-Pattern Recognition and Lemma Discovery in ACL2. LPAR 2013: 389-406
Start Year	2013


Title	ACL2(ml)
Description	ACL2(ml) is an extension for Emacs that works with the automated theorem prover ACL2; and finds families of related ACL2 definitions and proofs on the basis of shape of terms.
Type Of Technology	Software
Year Produced	2013
Open Source License?	Yes
Impact	The paper has led to collaboration with Universities of Edinburgh and Chalmers (Sweden); as well as industries, via SICSA Industrial Proof of Concept grant.
URL	http://staff.computing.dundee.ac.uk/katya/acl2ml/


Title	CoALP in Go: parallel and corecursive version
Description	We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2014.
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2017.
URL	http://staff.computing.dundee.ac.uk/katya/coalp/


Title	CoALP: Coalgebraic Logic programming in Haskell
Description	We present CoALP -- an implementation of Coalgebraic Logic programming. It is a dialect of first-order Horn-Clause logic that features lazy guarded corecursion and parallelism. CoALP arose from Coalgebraic semantics we developed in 2010-2012. The implementation was developed by the research team employed by the grant between 2013 and 2016.
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	This software was taken for further development by a new EPSRC grant Coalgebraic Logic Programming for Type Inference: parallelism and corecursion for a new generation of programming languages for parallelism and corecursion, 2013-2016.
URL	http://staff.computing.dundee.ac.uk/katya/coalp/


Title	ML4PG -- version 2, 2014
Description	We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environments Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development.
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014).
URL	http://staff.computing.dundee.ac.uk/katya/ML4PG/


Title	ML4PG: machine learning for proof general
Description	We present ML4PG, a machine-learning extension for Proof General. It allows to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof-tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in machine-learning environmnents Matlab and Weka. ML4PG provides automated interfacing between Proof General and Matlab/Weka. The results of clustering are used by ML4PG to provide proof-hints in the process of interactive proof development.
Type Of Technology	Software
Year Produced	2012
Open Source License?	Yes
Impact	The software was disseminated to industry via SICSA Industrial Proof of Concept grant (2013-2014).
URL	http://staff.computing.dundee.ac.uk/katya/ML4PG/


Description	Dagstuhl talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	I gave an invited talk at Dagstuhl Seminar ``AI meets formal methods". There were ~50 participants, including industrial participants and research students.
Year(s) Of Engagement Activity	2012
URL	http://drops.dagstuhl.de/opus/volltexte/2012/3731/pdf/dagrep_v002_i007_p001_s12271.pdf


Description	Invited talk at ARW'18
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Invited talk "Machine learning for mining, understanding and automating computer proofs" at Automated Reasoning Workshop, Cambridge, UK, 12-13 April 2018. About 50 attendees, including PG students.
Year(s) Of Engagement Activity	2018


Description	Invited talk at Leicester
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Invited talk ``Statistical Machine Learning in Interactive Theorem Proving" at Leicester University. about 20 PG and UG students attended.
Year(s) Of Engagement Activity	2013


Description	Invited talk at Newton Institute
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited talk "Machine Learning for Interactive Theorem Proving: Revisit, Reuse and Recycle your Proofs" at the Computer-Aided Mathematical Proof workshop, Isaac Newton Institute for Mathematical Sciences, Cambridge, 10-14 July 2017. The event had hundreds of participants.
Year(s) Of Engagement Activity	2017


Description	Invited talk at Oxford
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Invited talk "Can statistical machine learning advance mechanised proof technology?" at Oxford university. About 10 research students attended.
Year(s) Of Engagement Activity	2013


Description	Invited talk on ML4PG
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Invited talk "ML4PG: Machine Learning for Proof General: interfacing interfaces" given at Universities of Edinburgh and Strathcyde. About 10-20 research students attended the talk.
Year(s) Of Engagement Activity	2012

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications