Embedding Machine Learning within Quantifier Elimination Procedures

Lead Research Organisation: Coventry University
Department Name: Ctr for Fluid and Complex Systems

Abstract

This project concerns computational mathematics and logic. The aim is to improve the ability of computers to perform ``Quantifier Elimination'' (QE).

We say a logical statement is ``quantified'' if it is preceded by a qualification such as "for all" or "there exists". Here is an example of a quantified statement:
"there exists x such that ax^2 + bx + c = 0 has two solutions for x".
While the statement is mathematically precise the implications are unclear - what restrictions does this statement of existence force upon us? QE corresponds to replacing a quantified statement by an unquantified one which is equivalent. In this case we may replace the statement by:
"b^2 - 4ac > 0", which is the condition for x to have two solutions.
You may have recognised this equivalence from GCSE mathematics, when studying the quadratic equation. The important point here is that the latter statement can actually be derived automatically by a computer from the former, using a QE procedure.

QE is not subject to the numerical rounding errors of most computations. Solutions are not in the form of a numerical answer but an algebraic description which offers insight into the structure of the problem at hand. In the example above, QE shows us not what the solutions to a particular quadratic equation are, but how in general the number of solutions depends on the coefficients a, b, and c.

QE has numerous applications throughout engineering and the sciences. An example from biology is the determination of medically important values of parameters in a biological network; while another from economics is identifying which hypotheses in economic theories are compatible, and for what values of the variables. In both cases, QE can theoretically help, but in practice the size of the statements means state-of-the-art procedures run out of computer time/memory.

The extensive development of QE procedures means they have many options and choices about how they are run. These decisions can greatly affect how long QE takes, rendering an intractable problem easy and vice versa. Making the right choice is a critical, but understudied problem and is the focus of this project. At the moment QE procedures make such choices either under direct supervision of a human or based on crude human-made heuristics (rules of thumb based on intuition / experience but with limited scientific basis). The purpose of this project is to replace these by machine learning techniques. Machine Learning (ML) is an overarching term for tools that allow computers to make decisions that are not explicitly programmed, usually involving the statistical analysis of large quantities of data. ML is quite at odds with the field of Symbolic Computation which studies QE, as the latter prizes exact correctness and so shuns the use of probabilistic tools making its application here very novel. We are able to combine these different worlds because the choices which we will use ML to make will all produce a correct and exact answer (but with different computational costs).

The project follows pilot studies undertaken by the PI which experimented with one ML technique and found it improved upon existing heuristics for two particular decisions in a QE algorithm. We will build on this by working with the spectrum of leading ML tools to identify the optimal techniques for application in Symbolic Computation. We will demonstrate their use for both low level algorithm decisions and choices between different theories and implementations. Although focused on QE, we will also demonstrate ML as being a new route to optimisation in Computer Algebra more broadly and work encompasses Project Partners and events to maximise this. Finally, the project will deliver an improved QE procedure that makes use of ML automatically, without user input. This will be produced in the commercial Computer Algebra software Maple in collaboration with industrial Project Partner Maplesoft.

Planned Impact

Quantifier Elimination (QE) allows for the removal of quantifiers within logical statements to produce clear unambiguous algebraic constraints. In this sense QE is the act of simplifying or solving a problem exactly, like pen and paper symbolic mathematics. The impact of improved QE procedures is felt not just in mathematics and computer science, but throughout engineering and the natural sciences - wherever there is a desire for such solutions. Some examples are given below.

Initial impact from the project will be secured via collaboration with industrial Project Partner Maplesoft, whose European HQ is in Cambridge. Maplesoft produce the Maple Computer Algebra System used widely in science, education and industry. The PI will implement work in the Maple language and work with the Maple development team on the transfer of these project outcomes. This will guarantee a wide impact of the new procedures, not just by including them in a widely used piece of software, but by having them linked up as sub-routines to the main user commands. Maplesoft also produce the modelling and simulation tool MapleSim which uses Maple as a backend, thus offering a further route to impact. All outcomes (publications and data) will be published open access to facilitate additional transfers.

More broadly, the project aims to impact the thinking around software development in Computer Algebra Systems, by proving the effectiveness of machine learning as a route to optimisation. This impact will be insured not only via engaging with Maplesoft but also the wider field. We will hold two project events to take place at existing Symbolic Computation conferences, to maximise engagement and participation. We also plan further engagement with academic Project Partners who lead development teams on Computer Algebra Software packages: Redlog at CNRS, Inria, CoCoA at U. Genoa, and SMT-RAT at U. Aachen.

There will also be some local impacts of the project. It will directly support the UK career of both an RA and the PI, the latter gaining the particularly valuable first RCUK grant and also experience of line management. Further it will indirectly support a PhD student as Coventry U. have committed to fund a studentship associated with the project. Finally, it will feed into curriculum development: providing suitable examples, guest lectures and topics for final year projects on the Computer Science BSc at Coventry U.

We finish by discussing three potential real world beneficiaries of QE the PI is involved with. In each of these improved QE procedures would allow for much greater impact.

- Biology/Epidemiology community: All sorts of biological processes may be modelled using systems of differential equations, such as epidemic spread of disease or protein reactions indicative of cancer. The identification of conditions on the parameters for stable solutions of these equations may be achieved using QE. The PI has recently collaborated with biologists on identifying bi-stability regions of the MAPK network.

- Formal Verification and Safety community: The use of computers to check a logical statement beyond all doubt (by successively combining existing results with basic axioms) is often used to ensure the correctness of safety-critical systems. For example, KeYmaera combines a theorem prover with real QE procedures for verification in the railway, automotive, and aviation industries, and even surgical robotics. The PI is a member of a separate EU project engaging with such applications.

- Validation of Economic Hypotheses: Economists make models under a variety of assumptions, many of which can be expressed in the language of QE. The compatibility of given assumptions is often not clear, but QE could validate the compatibility of assumptions, or identify regions of compatibility. The PI is in correspondence with a leading economist at U. Chicago who is the first to make use of QE in this context.
 
Description The grant started in Summer 2018. We had an excellent kick off session (as described elsewhere in ResearchFish) and successfully recruited a post doctoral researcher who started work in December 2018.

The PostDoc has generated a data set and run several rounds of experiments resulting in four papers (and one more under preparation). The first identified ML modles that outperform the one identified in the preliminary case study; the second derived a new feature identification framework which improved the performance of all ML models; the third demonstrated how ML hyperparameter selection could be adapted to better suite the application; and the fourth describes our open source code release.
Exploitation Route Already there are a variety of other research groups looking at applying ML to computer algebra. An overview of this can be found in the records of the two ICMS Special Sessions that the project oraginsed in (2018 and 2020).
http://icms-conference.org/2018/sessions/
http://www.iaa.tu-bs.de/AppliedAlgebra/ICMS2020/ICMS2020_Sessions.html
Sectors Digital/Communication/Information Technologies (including Software),Education,Electronics,Pharmaceuticals and Medical Biotechnology,Transport

 
Description The work of this project has directly led to ML becoming part of the methodologies of computer algebra system developers. This is evidenced through their publications citing the work here. While many of these developers are academics, there are also many who are not. For example, the recent work at URL below is sponsored by DARPA and has an author long affiliated to a FinTech company. Most notably, we can confirm the success of the research in the project has directly led the international software company Maplesoft to invest in R&D to apply machine learning techniques to other symbolic computation algorithms (symbolic simplification and symbolic integration) in their flagship product, the computer algebra system Maple. We know this because they are doing it in collaboration with Coventry University. Maplesoft are sponsoring a PhD student - Rashid Barket (https://pureportal.coventry.ac.uk/en/persons/rashid-barket) - to work at Coventry University from May 2022 - Dec 2025. His project is looking at machine learning to optimise their symbolic integration and symbolic simplification commands.
First Year Of Impact 2020
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Coventry University - Maplesoft (Industry co-funded PhD Scheme)
Amount £76,434 (GBP)
Organisation Maplesoft 
Sector Private
Country Canada
Start 05/2022 
End 12/2025
 
Description Coventry University Full PhD Fees & Living Allowance Studentship
Amount £80,000 (GBP)
Organisation Coventry University 
Sector Academic/University
Country United Kingdom
Start 05/2019 
End 12/2022
 
Description Pushing Back the Doubly-Exponential Wall of Cylindrical Algebraic Decomposition
Amount £421,950 (GBP)
Funding ID EP/T015748/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2020 
End 09/2024
 
Title New method to choose hyperparameters of sklearn classifiers to make decisions based on corresponding software runtimes 
Description In the paper below we describe a new automated approach to choose hyperparameters of classifiers in sklearn (by adapting the existing cross validation method). Instead of making a choice based on accuracy defined as picking the optimal choice we instead choose to minimise the corresponding software runtimes. This greatly improves the quality of the ML models. D. Florescu and M. England. Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness In: Mathematical Aspects of Computer and Information Sciences (Proc. MACIS 2019), Slamanig, D., Tsigaridas, E. & Zafeirakopoulos, Z. (eds.). Springer International Publishing, Lecture Notes in Computer Science Vol. 11989, p. 341-356. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact The code has recently been released open source. 
URL https://doi.org/10.1007/978-3-030-43120-4_27
 
Title New method to derive numeric features for machine learning from polynomial systems 
Description In the paper below we describe a new automated approach to derive features (vectors of real numbers suitable for Machine Learning) from systems of polynomial equations. This greatly improves the quality of the ML models. D. Florescu and M. England. Algorithmically generating new algebraic features of polynomial systems for machine learning. Proceedings of the 4th Workshop on Satisfiability Checking and Symbolic Computation (SC2 '19), 12 pages. CEUR Workshop Proceedings 2460, 2019. http://ceur-ws.org/Vol-2460/ 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact We intend to release openly the code to do this in the near future. 
URL http://ceur-ws.org/Vol-2460/
 
Title A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs 
Description This toolbox supports the results in the following publication: D. Florescu and M. England. A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs. The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. The main script is ML_test_rand/pipeline1.py. More details can be found as comments in the script. The sotd heuristic is implemented in the file data_gen_sotd_rand_test.mw. The data is already generated in the repository. The dataset of polynomials can be found in folders entitled poly_rand_dataset (for training) and poly_rand_dataset_test (for testing). The CAD data is generated by running generate_CAD_data.py. The data is already generated in the repository. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. The library file is also available in this repository (RegularChains_Updated.mla) This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. Some data files generated by the pipeline are included in this repository for consistency and for saving time. However, they can be generated again by the user should they wish so: - the predictions with the sotd heuristic (II(d) in the supported paper) - the ML hyperparameters, resulted from 5-fold cross-validation (I(d)i in the supported paper) - the files containing CAD runtimes (in the folders comp_times_rand_dataset and comp_times_rand_dataset_test, corresponding to I(a) and II(e) in the supported paper) 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact It goes beyond the basic requirements of open data to make the experiments in the paper fully reproducible. Further, the code could be easily used to study similar problems. 
URL https://zenodo.org/record/3731703
 
Title Data, Software and Instructions supporting the CICM 2019 paper 
Description This Zenodo dataset contains the test data, software and instructions on how the experiments in our 2019 CICM paper were run. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact It goes beyond the basic requirements of open data to make the experiments in the paper fully reproducible. 
URL https://doi.org/10.5281/zenodo.2658626
 
Description Maplesoft - ML for Maple Routines 
Organisation Maplesoft
Country Canada 
Sector Private 
PI Contribution Maplesoft are co-sponsoring PhD student Rashid Barkkey to study at Coventry University (May 2022 - Nov 2025) working on a reserach project inspired by this one.
Collaborator Contribution Maplesoft are providing 50% of the fees and stipend, an external supervisor meeting weekly with the student, plus internal company data and coding expertise.
Impact No outputs yet. Student enrolled in May 2022 - first outputs expected later in 2023.
Start Year 2021
 
Description RWTH: Coverings Computation Scheme 
Organisation RWTH Aachen University
Country Germany 
Sector Academic/University 
PI Contribution Dr England visited RWTH Aachen in December 2018 to meet with Prof Erika Abraham (Aachen), her student (Gereon Kremer) and Prof James Davenport (Bath). The four of us discussed ideas for a new CAD computation scheme, ending the visit with pseudocode for a new algorithm. The algorithm is related to the present grant: it offers even greater scope for ML optimisation of heuristic choice as it is search based. The 4 of us worked collaboratively on this task.
Collaborator Contribution As noted above, the 4 of us worked collaboratively on this task.
Impact The resulting journal paper was published in 2021. Dr England has a new EPSRC grant where one work package is dedicated to pushing this research idea further in collaboraton with prof Abraham.
Start Year 2018
 
Description U. Cincinnati - XAI and CAD 
Organisation University of Cincinnati
Country United States 
Sector Academic/University 
PI Contribution This collaboration seeks to use XAI methods on an analysis of Ml models for computer algebra to learn new information on the underlying algebraic algorithm. The collaboration is with PhD student Lynn Pickering at U. Cincinnati. Lynn provided the XAI expertise.
Collaborator Contribution We provided the expertise on the underlying algebraic algorithm.
Impact A paper is in the final stages of develoment.
Start Year 2022
 
Title A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs 
Description This toolbox supports the results in the following publication: D. Florescu and M. England. A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs. The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. The main script is ML_test_rand/pipeline1.py. More details can be found as comments in the script. The sotd heuristic is implemented in the file data_gen_sotd_rand_test.mw. The data is already generated in the repository. The dataset of polynomials can be found in folders entitled poly_rand_dataset (for training) and poly_rand_dataset_test (for testing). The CAD data is generated by running generate_CAD_data.py. The data is already generated in the repository. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. The library file is also available in this repository (RegularChains_Updated.mla) This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. Some data files generated by the pipeline are included in this repository for consistency and for saving time. However, they can be generated again by the user should they wish so: - the predictions with the sotd heuristic (II(d) in the supported paper) - the ML hyperparameters, resulted from 5-fold cross-validation (I(d)i in the supported paper) - the files containing CAD runtimes (in the folders comp_times_rand_dataset and comp_times_rand_dataset_test, corresponding to I(a) and II(e) in the supported paper) 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact It has been downloaded 21 times to date (Feb 2021) according to Zenodo. 
URL https://zenodo.org/record/3731703
 
Description ML4AI Grant Closing Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A speciaal session at the International Congress on Mathematical Software titled "Artificial Intellegence and Mathematical Software"
Year(s) Of Engagement Activity 2020
URL http://www.iaa.tu-bs.de/AppliedAlgebra/ICMS2020/ICMS2020_Sessions.html
 
Description ML4AI Grant Kick off Session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The grant kick off session was help as a session of the 2018 International Congress on Mathematical Software. The session was organised by the PI and has 8 contributors (including the PI) and an audience of approximately 40.

The aim of the session was to influence the developers of mathematical software ("professional practitioners" in the categorisation above) to consider whether, where and how best machine learning could be used to optimise their software. The PI invited talks from the project partners but this only made up half the program. The other 4 talks were offered by interested parties. A wide variety of mathematical software was represented (computer algebra, SAT/SMT, theorem provers, Math NLP). There was a lot of free-flowing debate withing and between talks. The session was one of the largest in the conference and also gained a substantial audience who were not affiliated with the speakers

Since most of the talks were about potentials rather than results most talks did not come with a publication. However, the PI wrote a survey article which is what the DOI below is for.
Year(s) Of Engagement Activity 2018
URL http://computing.coventry.ac.uk/~mengland/MLforMS-ICMS2018.html
 
Description RA gives Pints of Science Talk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Postdoctoral Researcher who is employed by the grant, Dorian Florescu, gave a talk as part of a Pints of Science event in Leamington Spa:
Year(s) Of Engagement Activity 2019
URL https://pintofscience.co.uk/event/wwwthebigbadwolf