Embedding Machine Learning within Quantifier Elimination Procedures

Lead Research Organisation: Coventry University

Department Name: Ctr for Fluid and Complex Systems

Abstract

This project concerns computational mathematics and logic. The aim is to improve the ability of computers to perform ``Quantifier Elimination'' (QE).

We say a logical statement is ``quantified'' if it is preceded by a qualification such as "for all" or "there exists". Here is an example of a quantified statement:
"there exists x such that ax^2 + bx + c = 0 has two solutions for x".
While the statement is mathematically precise the implications are unclear - what restrictions does this statement of existence force upon us? QE corresponds to replacing a quantified statement by an unquantified one which is equivalent. In this case we may replace the statement by:
"b^2 - 4ac > 0", which is the condition for x to have two solutions.
You may have recognised this equivalence from GCSE mathematics, when studying the quadratic equation. The important point here is that the latter statement can actually be derived automatically by a computer from the former, using a QE procedure.

QE is not subject to the numerical rounding errors of most computations. Solutions are not in the form of a numerical answer but an algebraic description which offers insight into the structure of the problem at hand. In the example above, QE shows us not what the solutions to a particular quadratic equation are, but how in general the number of solutions depends on the coefficients a, b, and c.

QE has numerous applications throughout engineering and the sciences. An example from biology is the determination of medically important values of parameters in a biological network; while another from economics is identifying which hypotheses in economic theories are compatible, and for what values of the variables. In both cases, QE can theoretically help, but in practice the size of the statements means state-of-the-art procedures run out of computer time/memory.

The extensive development of QE procedures means they have many options and choices about how they are run. These decisions can greatly affect how long QE takes, rendering an intractable problem easy and vice versa. Making the right choice is a critical, but understudied problem and is the focus of this project. At the moment QE procedures make such choices either under direct supervision of a human or based on crude human-made heuristics (rules of thumb based on intuition / experience but with limited scientific basis). The purpose of this project is to replace these by machine learning techniques. Machine Learning (ML) is an overarching term for tools that allow computers to make decisions that are not explicitly programmed, usually involving the statistical analysis of large quantities of data. ML is quite at odds with the field of Symbolic Computation which studies QE, as the latter prizes exact correctness and so shuns the use of probabilistic tools making its application here very novel. We are able to combine these different worlds because the choices which we will use ML to make will all produce a correct and exact answer (but with different computational costs).

The project follows pilot studies undertaken by the PI which experimented with one ML technique and found it improved upon existing heuristics for two particular decisions in a QE algorithm. We will build on this by working with the spectrum of leading ML tools to identify the optimal techniques for application in Symbolic Computation. We will demonstrate their use for both low level algorithm decisions and choices between different theories and implementations. Although focused on QE, we will also demonstrate ML as being a new route to optimisation in Computer Algebra more broadly and work encompasses Project Partners and events to maximise this. Finally, the project will deliver an improved QE procedure that makes use of ML automatically, without user input. This will be produced in the commercial Computer Algebra software Maple in collaboration with industrial Project Partner Maplesoft.

Planned Impact

Quantifier Elimination (QE) allows for the removal of quantifiers within logical statements to produce clear unambiguous algebraic constraints. In this sense QE is the act of simplifying or solving a problem exactly, like pen and paper symbolic mathematics. The impact of improved QE procedures is felt not just in mathematics and computer science, but throughout engineering and the natural sciences - wherever there is a desire for such solutions. Some examples are given below.

Initial impact from the project will be secured via collaboration with industrial Project Partner Maplesoft, whose European HQ is in Cambridge. Maplesoft produce the Maple Computer Algebra System used widely in science, education and industry. The PI will implement work in the Maple language and work with the Maple development team on the transfer of these project outcomes. This will guarantee a wide impact of the new procedures, not just by including them in a widely used piece of software, but by having them linked up as sub-routines to the main user commands. Maplesoft also produce the modelling and simulation tool MapleSim which uses Maple as a backend, thus offering a further route to impact. All outcomes (publications and data) will be published open access to facilitate additional transfers.

More broadly, the project aims to impact the thinking around software development in Computer Algebra Systems, by proving the effectiveness of machine learning as a route to optimisation. This impact will be insured not only via engaging with Maplesoft but also the wider field. We will hold two project events to take place at existing Symbolic Computation conferences, to maximise engagement and participation. We also plan further engagement with academic Project Partners who lead development teams on Computer Algebra Software packages: Redlog at CNRS, Inria, CoCoA at U. Genoa, and SMT-RAT at U. Aachen.

There will also be some local impacts of the project. It will directly support the UK career of both an RA and the PI, the latter gaining the particularly valuable first RCUK grant and also experience of line management. Further it will indirectly support a PhD student as Coventry U. have committed to fund a studentship associated with the project. Finally, it will feed into curriculum development: providing suitable examples, guest lectures and topics for final year projects on the Computer Science BSc at Coventry U.

We finish by discussing three potential real world beneficiaries of QE the PI is involved with. In each of these improved QE procedures would allow for much greater impact.

- Biology/Epidemiology community: All sorts of biological processes may be modelled using systems of differential equations, such as epidemic spread of disease or protein reactions indicative of cancer. The identification of conditions on the parameters for stable solutions of these equations may be achieved using QE. The PI has recently collaborated with biologists on identifying bi-stability regions of the MAPK network.

- Formal Verification and Safety community: The use of computers to check a logical statement beyond all doubt (by successively combining existing results with basic axioms) is often used to ensure the correctness of safety-critical systems. For example, KeYmaera combines a theorem prover with real QE procedures for verification in the railway, automotive, and aviation industries, and even surgical robotics. The PI is a member of a separate EU project engaging with such applications.

- Validation of Economic Hypotheses: Economists make models under a variety of assumptions, many of which can be expressed in the language of QE. The compatibility of given assumptions is often not clear, but QE could validate the compatibility of assumptions, or identify regions of compatibility. The PI is in correspondence with a leading economist at U. Chicago who is the first to make use of QE in this context.

Funded Value:

£100,987

Funded Period:

Jul 18 - Dec 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R019622/1

Principal Investigator:

Matthew England

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (25%)

Fundamentals of Computing (50%)

Software Engineering (25%)

Organisations

People	ORCID iD
Matthew England (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

England M (2019) Intelligent Computer Mathematics - 12th International Conference, CICM 2019, Prague, Czech Republic, July 8-12, 2019, Proceedings

England M (2018) Mathematical Software - ICMS 2018 - 6th International Conference, South Bend, IN, USA, July 24-27, 2018, Proceedings

ENGLAND, M. (2022) SC-Square: Future Progress with Machine Learning

Florescu D (2020) Mathematical Software - ICMS 2020 - 7th International Conference, Braunschweig, Germany, July 13-16, 2020, Proceedings

Florescu D. (2019) Algorithmically generating new algebraic features of polynomial systems for machine learning

Huang Z (2019) Using Machine Learning to Improve Cylindrical Algebraic Decomposition in Mathematics in Computer Science

Svenson O (2022) Perceived Corona virus exposure as a function of interpersonal distance and time of a conversation. in Discover social science and health

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	The grant started in Summer 2018. We had an excellent kick off session (as described elsewhere in ResearchFish) and successfully recruited a post doctoral researcher who started work in December 2018. The PostDoc has generated a data set and run several rounds of experiments resulting in four papers (and one more under preparation). The first identified ML modles that outperform the one identified in the preliminary case study; the second derived a new feature identification framework which improved the performance of all ML models; the third demonstrated how ML hyperparameter selection could be adapted to better suite the application; and the fourth describes our open source code release.
Exploitation Route	Already there are a variety of other research groups looking at applying ML to computer algebra. An overview of this can be found in the records of the two ICMS Special Sessions that the project oraginsed in (2018 and 2020). http://icms-conference.org/2018/sessions/ http://www.iaa.tu-bs.de/AppliedAlgebra/ICMS2020/ICMS2020_Sessions.html
Sectors	Digital/Communication/Information Technologies (including Software),Education,Electronics,Pharmaceuticals and Medical Biotechnology,Transport


Description	The work of this project has directly led to ML becoming part of the methodologies of computer algebra system developers. This is evidenced through their publications citing the work here. While many of these developers are academics, there are also many who are not. For example, the recent work at URL below is sponsored by DARPA and has an author long affiliated to a FinTech company. Most notably, we can confirm the success of the research in the project has directly led the international software company Maplesoft to invest in R&D to apply machine learning techniques to other symbolic computation algorithms (symbolic simplification and symbolic integration) in their flagship product, the computer algebra system Maple. We know this because they are doing it in collaboration with Coventry University. Maplesoft are sponsoring a PhD student - Rashid Barket (https://pureportal.coventry.ac.uk/en/persons/rashid-barket) - to work at Coventry University from May 2022 - Dec 2025. His project is looking at machine learning to optimise their symbolic integration and symbolic simplification commands.
First Year Of Impact	2020
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Coventry University - Maplesoft (Industry co-funded PhD Scheme)
Amount	£76,434 (GBP)
Organisation	Maplesoft
Sector	Private
Country	Canada
Start	05/2022
End	12/2025


Description	Coventry University Full PhD Fees & Living Allowance Studentship
Amount	£80,000 (GBP)
Organisation	Coventry University
Sector	Academic/University
Country	United Kingdom
Start	05/2019
End	12/2022


Description	Pushing Back the Doubly-Exponential Wall of Cylindrical Algebraic Decomposition
Amount	£421,950 (GBP)
Funding ID	EP/T015748/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	06/2020
End	09/2024


Title	New method to choose hyperparameters of sklearn classifiers to make decisions based on corresponding software runtimes
Description	In the paper below we describe a new automated approach to choose hyperparameters of classifiers in sklearn (by adapting the existing cross validation method). Instead of making a choice based on accuracy defined as picking the optimal choice we instead choose to minimise the corresponding software runtimes. This greatly improves the quality of the ML models. D. Florescu and M. England. Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness In: Mathematical Aspects of Computer and Information Sciences (Proc. MACIS 2019), Slamanig, D., Tsigaridas, E. & Zafeirakopoulos, Z. (eds.). Springer International Publishing, Lecture Notes in Computer Science Vol. 11989, p. 341-356.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	The code has recently been released open source.
URL	https://doi.org/10.1007/978-3-030-43120-4_27


Title	New method to derive numeric features for machine learning from polynomial systems
Description	In the paper below we describe a new automated approach to derive features (vectors of real numbers suitable for Machine Learning) from systems of polynomial equations. This greatly improves the quality of the ML models. D. Florescu and M. England. Algorithmically generating new algebraic features of polynomial systems for machine learning. Proceedings of the 4th Workshop on Satisfiability Checking and Symbolic Computation (SC2 '19), 12 pages. CEUR Workshop Proceedings 2460, 2019. http://ceur-ws.org/Vol-2460/
Type Of Material	Improvements to research infrastructure
Year Produced	2019
Provided To Others?	No
Impact	We intend to release openly the code to do this in the near future.
URL	http://ceur-ws.org/Vol-2460/


Title	A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs
Description	This toolbox supports the results in the following publication: D. Florescu and M. England. A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs. The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. The main script is ML_test_rand/pipeline1.py. More details can be found as comments in the script. The sotd heuristic is implemented in the file data_gen_sotd_rand_test.mw. The data is already generated in the repository. The dataset of polynomials can be found in folders entitled poly_rand_dataset (for training) and poly_rand_dataset_test (for testing). The CAD data is generated by running generate_CAD_data.py. The data is already generated in the repository. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. The library file is also available in this repository (RegularChains_Updated.mla) This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. Some data files generated by the pipeline are included in this repository for consistency and for saving time. However, they can be generated again by the user should they wish so: - the predictions with the sotd heuristic (II(d) in the supported paper) - the ML hyperparameters, resulted from 5-fold cross-validation (I(d)i in the supported paper) - the files containing CAD runtimes (in the folders comp_times_rand_dataset and comp_times_rand_dataset_test, corresponding to I(a) and II(e) in the supported paper)
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	It goes beyond the basic requirements of open data to make the experiments in the paper fully reproducible. Further, the code could be easily used to study similar problems.
URL	https://zenodo.org/record/3731703


Title	Data, Software and Instructions supporting the CICM 2019 paper
Description	This Zenodo dataset contains the test data, software and instructions on how the experiments in our 2019 CICM paper were run.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	It goes beyond the basic requirements of open data to make the experiments in the paper fully reproducible.
URL	https://doi.org/10.5281/zenodo.2658626


Description	Maplesoft - ML for Maple Routines
Organisation	Maplesoft
Country	Canada
Sector	Private
PI Contribution	Maplesoft are co-sponsoring PhD student Rashid Barkkey to study at Coventry University (May 2022 - Nov 2025) working on a reserach project inspired by this one.
Collaborator Contribution	Maplesoft are providing 50% of the fees and stipend, an external supervisor meeting weekly with the student, plus internal company data and coding expertise.
Impact	No outputs yet. Student enrolled in May 2022 - first outputs expected later in 2023.
Start Year	2021


Description	RWTH: Coverings Computation Scheme
Organisation	RWTH Aachen University
Country	Germany
Sector	Academic/University
PI Contribution	Dr England visited RWTH Aachen in December 2018 to meet with Prof Erika Abraham (Aachen), her student (Gereon Kremer) and Prof James Davenport (Bath). The four of us discussed ideas for a new CAD computation scheme, ending the visit with pseudocode for a new algorithm. The algorithm is related to the present grant: it offers even greater scope for ML optimisation of heuristic choice as it is search based. The 4 of us worked collaboratively on this task.
Collaborator Contribution	As noted above, the 4 of us worked collaboratively on this task.
Impact	The resulting journal paper was published in 2021. Dr England has a new EPSRC grant where one work package is dedicated to pushing this research idea further in collaboraton with prof Abraham.
Start Year	2018


Description	U. Cincinnati - XAI and CAD
Organisation	University of Cincinnati
Country	United States
Sector	Academic/University
PI Contribution	This collaboration seeks to use XAI methods on an analysis of Ml models for computer algebra to learn new information on the underlying algebraic algorithm. The collaboration is with PhD student Lynn Pickering at U. Cincinnati. Lynn provided the XAI expertise.
Collaborator Contribution	We provided the expertise on the underlying algebraic algorithm.
Impact	A paper is in the final stages of develoment.
Start Year	2022


Title	A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs
Description	This toolbox supports the results in the following publication: D. Florescu and M. England. A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs. The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. The main script is ML_test_rand/pipeline1.py. More details can be found as comments in the script. The sotd heuristic is implemented in the file data_gen_sotd_rand_test.mw. The data is already generated in the repository. The dataset of polynomials can be found in folders entitled poly_rand_dataset (for training) and poly_rand_dataset_test (for testing). The CAD data is generated by running generate_CAD_data.py. The data is already generated in the repository. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. The library file is also available in this repository (RegularChains_Updated.mla) This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. Some data files generated by the pipeline are included in this repository for consistency and for saving time. However, they can be generated again by the user should they wish so: - the predictions with the sotd heuristic (II(d) in the supported paper) - the ML hyperparameters, resulted from 5-fold cross-validation (I(d)i in the supported paper) - the files containing CAD runtimes (in the folders comp_times_rand_dataset and comp_times_rand_dataset_test, corresponding to I(a) and II(e) in the supported paper)
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	It has been downloaded 21 times to date (Feb 2021) according to Zenodo.
URL	https://zenodo.org/record/3731703


Description	ML4AI Grant Closing Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	A speciaal session at the International Congress on Mathematical Software titled "Artificial Intellegence and Mathematical Software"
Year(s) Of Engagement Activity	2020
URL	http://www.iaa.tu-bs.de/AppliedAlgebra/ICMS2020/ICMS2020_Sessions.html


Description	ML4AI Grant Kick off Session
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The grant kick off session was help as a session of the 2018 International Congress on Mathematical Software. The session was organised by the PI and has 8 contributors (including the PI) and an audience of approximately 40. The aim of the session was to influence the developers of mathematical software ("professional practitioners" in the categorisation above) to consider whether, where and how best machine learning could be used to optimise their software. The PI invited talks from the project partners but this only made up half the program. The other 4 talks were offered by interested parties. A wide variety of mathematical software was represented (computer algebra, SAT/SMT, theorem provers, Math NLP). There was a lot of free-flowing debate withing and between talks. The session was one of the largest in the conference and also gained a substantial audience who were not affiliated with the speakers Since most of the talks were about potentials rather than results most talks did not come with a publication. However, the PI wrote a survey article which is what the DOI below is for.
Year(s) Of Engagement Activity	2018
URL	http://computing.coventry.ac.uk/~mengland/MLforMS-ICMS2018.html


Description	RA gives Pints of Science Talk
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	The Postdoctoral Researcher who is employed by the grant, Dorian Florescu, gave a talk as part of a Pints of Science event in Leamington Spa:
Year(s) Of Engagement Activity	2019
URL	https://pintofscience.co.uk/event/wwwthebigbadwolf

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications