# Embedding Machine Learning within Quantifier Elimination Procedures

Lead Research Organisation:
Coventry University

Department Name: Ctr for Fluid and Complex Systems

### Abstract

This project concerns computational mathematics and logic. The aim is to improve the ability of computers to perform ``Quantifier Elimination'' (QE).

We say a logical statement is ``quantified'' if it is preceded by a qualification such as "for all" or "there exists". Here is an example of a quantified statement:

"there exists x such that ax^2 + bx + c = 0 has two solutions for x".

While the statement is mathematically precise the implications are unclear - what restrictions does this statement of existence force upon us? QE corresponds to replacing a quantified statement by an unquantified one which is equivalent. In this case we may replace the statement by:

"b^2 - 4ac > 0", which is the condition for x to have two solutions.

You may have recognised this equivalence from GCSE mathematics, when studying the quadratic equation. The important point here is that the latter statement can actually be derived automatically by a computer from the former, using a QE procedure.

QE is not subject to the numerical rounding errors of most computations. Solutions are not in the form of a numerical answer but an algebraic description which offers insight into the structure of the problem at hand. In the example above, QE shows us not what the solutions to a particular quadratic equation are, but how in general the number of solutions depends on the coefficients a, b, and c.

QE has numerous applications throughout engineering and the sciences. An example from biology is the determination of medically important values of parameters in a biological network; while another from economics is identifying which hypotheses in economic theories are compatible, and for what values of the variables. In both cases, QE can theoretically help, but in practice the size of the statements means state-of-the-art procedures run out of computer time/memory.

The extensive development of QE procedures means they have many options and choices about how they are run. These decisions can greatly affect how long QE takes, rendering an intractable problem easy and vice versa. Making the right choice is a critical, but understudied problem and is the focus of this project. At the moment QE procedures make such choices either under direct supervision of a human or based on crude human-made heuristics (rules of thumb based on intuition / experience but with limited scientific basis). The purpose of this project is to replace these by machine learning techniques. Machine Learning (ML) is an overarching term for tools that allow computers to make decisions that are not explicitly programmed, usually involving the statistical analysis of large quantities of data. ML is quite at odds with the field of Symbolic Computation which studies QE, as the latter prizes exact correctness and so shuns the use of probabilistic tools making its application here very novel. We are able to combine these different worlds because the choices which we will use ML to make will all produce a correct and exact answer (but with different computational costs).

The project follows pilot studies undertaken by the PI which experimented with one ML technique and found it improved upon existing heuristics for two particular decisions in a QE algorithm. We will build on this by working with the spectrum of leading ML tools to identify the optimal techniques for application in Symbolic Computation. We will demonstrate their use for both low level algorithm decisions and choices between different theories and implementations. Although focused on QE, we will also demonstrate ML as being a new route to optimisation in Computer Algebra more broadly and work encompasses Project Partners and events to maximise this. Finally, the project will deliver an improved QE procedure that makes use of ML automatically, without user input. This will be produced in the commercial Computer Algebra software Maple in collaboration with industrial Project Partner Maplesoft.

We say a logical statement is ``quantified'' if it is preceded by a qualification such as "for all" or "there exists". Here is an example of a quantified statement:

"there exists x such that ax^2 + bx + c = 0 has two solutions for x".

While the statement is mathematically precise the implications are unclear - what restrictions does this statement of existence force upon us? QE corresponds to replacing a quantified statement by an unquantified one which is equivalent. In this case we may replace the statement by:

"b^2 - 4ac > 0", which is the condition for x to have two solutions.

You may have recognised this equivalence from GCSE mathematics, when studying the quadratic equation. The important point here is that the latter statement can actually be derived automatically by a computer from the former, using a QE procedure.

QE is not subject to the numerical rounding errors of most computations. Solutions are not in the form of a numerical answer but an algebraic description which offers insight into the structure of the problem at hand. In the example above, QE shows us not what the solutions to a particular quadratic equation are, but how in general the number of solutions depends on the coefficients a, b, and c.

QE has numerous applications throughout engineering and the sciences. An example from biology is the determination of medically important values of parameters in a biological network; while another from economics is identifying which hypotheses in economic theories are compatible, and for what values of the variables. In both cases, QE can theoretically help, but in practice the size of the statements means state-of-the-art procedures run out of computer time/memory.

The extensive development of QE procedures means they have many options and choices about how they are run. These decisions can greatly affect how long QE takes, rendering an intractable problem easy and vice versa. Making the right choice is a critical, but understudied problem and is the focus of this project. At the moment QE procedures make such choices either under direct supervision of a human or based on crude human-made heuristics (rules of thumb based on intuition / experience but with limited scientific basis). The purpose of this project is to replace these by machine learning techniques. Machine Learning (ML) is an overarching term for tools that allow computers to make decisions that are not explicitly programmed, usually involving the statistical analysis of large quantities of data. ML is quite at odds with the field of Symbolic Computation which studies QE, as the latter prizes exact correctness and so shuns the use of probabilistic tools making its application here very novel. We are able to combine these different worlds because the choices which we will use ML to make will all produce a correct and exact answer (but with different computational costs).

The project follows pilot studies undertaken by the PI which experimented with one ML technique and found it improved upon existing heuristics for two particular decisions in a QE algorithm. We will build on this by working with the spectrum of leading ML tools to identify the optimal techniques for application in Symbolic Computation. We will demonstrate their use for both low level algorithm decisions and choices between different theories and implementations. Although focused on QE, we will also demonstrate ML as being a new route to optimisation in Computer Algebra more broadly and work encompasses Project Partners and events to maximise this. Finally, the project will deliver an improved QE procedure that makes use of ML automatically, without user input. This will be produced in the commercial Computer Algebra software Maple in collaboration with industrial Project Partner Maplesoft.

### Planned Impact

Quantifier Elimination (QE) allows for the removal of quantifiers within logical statements to produce clear unambiguous algebraic constraints. In this sense QE is the act of simplifying or solving a problem exactly, like pen and paper symbolic mathematics. The impact of improved QE procedures is felt not just in mathematics and computer science, but throughout engineering and the natural sciences - wherever there is a desire for such solutions. Some examples are given below.

Initial impact from the project will be secured via collaboration with industrial Project Partner Maplesoft, whose European HQ is in Cambridge. Maplesoft produce the Maple Computer Algebra System used widely in science, education and industry. The PI will implement work in the Maple language and work with the Maple development team on the transfer of these project outcomes. This will guarantee a wide impact of the new procedures, not just by including them in a widely used piece of software, but by having them linked up as sub-routines to the main user commands. Maplesoft also produce the modelling and simulation tool MapleSim which uses Maple as a backend, thus offering a further route to impact. All outcomes (publications and data) will be published open access to facilitate additional transfers.

More broadly, the project aims to impact the thinking around software development in Computer Algebra Systems, by proving the effectiveness of machine learning as a route to optimisation. This impact will be insured not only via engaging with Maplesoft but also the wider field. We will hold two project events to take place at existing Symbolic Computation conferences, to maximise engagement and participation. We also plan further engagement with academic Project Partners who lead development teams on Computer Algebra Software packages: Redlog at CNRS, Inria, CoCoA at U. Genoa, and SMT-RAT at U. Aachen.

There will also be some local impacts of the project. It will directly support the UK career of both an RA and the PI, the latter gaining the particularly valuable first RCUK grant and also experience of line management. Further it will indirectly support a PhD student as Coventry U. have committed to fund a studentship associated with the project. Finally, it will feed into curriculum development: providing suitable examples, guest lectures and topics for final year projects on the Computer Science BSc at Coventry U.

We finish by discussing three potential real world beneficiaries of QE the PI is involved with. In each of these improved QE procedures would allow for much greater impact.

- Biology/Epidemiology community: All sorts of biological processes may be modelled using systems of differential equations, such as epidemic spread of disease or protein reactions indicative of cancer. The identification of conditions on the parameters for stable solutions of these equations may be achieved using QE. The PI has recently collaborated with biologists on identifying bi-stability regions of the MAPK network.

- Formal Verification and Safety community: The use of computers to check a logical statement beyond all doubt (by successively combining existing results with basic axioms) is often used to ensure the correctness of safety-critical systems. For example, KeYmaera combines a theorem prover with real QE procedures for verification in the railway, automotive, and aviation industries, and even surgical robotics. The PI is a member of a separate EU project engaging with such applications.

- Validation of Economic Hypotheses: Economists make models under a variety of assumptions, many of which can be expressed in the language of QE. The compatibility of given assumptions is often not clear, but QE could validate the compatibility of assumptions, or identify regions of compatibility. The PI is in correspondence with a leading economist at U. Chicago who is the first to make use of QE in this context.

Initial impact from the project will be secured via collaboration with industrial Project Partner Maplesoft, whose European HQ is in Cambridge. Maplesoft produce the Maple Computer Algebra System used widely in science, education and industry. The PI will implement work in the Maple language and work with the Maple development team on the transfer of these project outcomes. This will guarantee a wide impact of the new procedures, not just by including them in a widely used piece of software, but by having them linked up as sub-routines to the main user commands. Maplesoft also produce the modelling and simulation tool MapleSim which uses Maple as a backend, thus offering a further route to impact. All outcomes (publications and data) will be published open access to facilitate additional transfers.

More broadly, the project aims to impact the thinking around software development in Computer Algebra Systems, by proving the effectiveness of machine learning as a route to optimisation. This impact will be insured not only via engaging with Maplesoft but also the wider field. We will hold two project events to take place at existing Symbolic Computation conferences, to maximise engagement and participation. We also plan further engagement with academic Project Partners who lead development teams on Computer Algebra Software packages: Redlog at CNRS, Inria, CoCoA at U. Genoa, and SMT-RAT at U. Aachen.

There will also be some local impacts of the project. It will directly support the UK career of both an RA and the PI, the latter gaining the particularly valuable first RCUK grant and also experience of line management. Further it will indirectly support a PhD student as Coventry U. have committed to fund a studentship associated with the project. Finally, it will feed into curriculum development: providing suitable examples, guest lectures and topics for final year projects on the Computer Science BSc at Coventry U.

We finish by discussing three potential real world beneficiaries of QE the PI is involved with. In each of these improved QE procedures would allow for much greater impact.

- Biology/Epidemiology community: All sorts of biological processes may be modelled using systems of differential equations, such as epidemic spread of disease or protein reactions indicative of cancer. The identification of conditions on the parameters for stable solutions of these equations may be achieved using QE. The PI has recently collaborated with biologists on identifying bi-stability regions of the MAPK network.

- Formal Verification and Safety community: The use of computers to check a logical statement beyond all doubt (by successively combining existing results with basic axioms) is often used to ensure the correctness of safety-critical systems. For example, KeYmaera combines a theorem prover with real QE procedures for verification in the railway, automotive, and aviation industries, and even surgical robotics. The PI is a member of a separate EU project engaging with such applications.

- Validation of Economic Hypotheses: Economists make models under a variety of assumptions, many of which can be expressed in the language of QE. The compatibility of given assumptions is often not clear, but QE could validate the compatibility of assumptions, or identify regions of compatibility. The PI is in correspondence with a leading economist at U. Chicago who is the first to make use of QE in this context.

### Publications

*Using Machine Learning to Improve Cylindrical Algebraic Decomposition*in Mathematics in Computer Science

Description | The grant started in Summer 2018. We had an excellent kick off session (as described elsewhere in ResearchFish) and successfully recruited a post doctoral researcher who started work in December 2018. The PostDoc has generated a data set and run several rounds of experiments resulting in three papers (two published and reported in ResearchFish, the other still InPress). The first identified ML modules that outperform the one identified in the preliminary case study. The second derived a new feature identification framework which improved the performance of all ML models. The third demonstrated how ML hyperparameter selection could be adapted to better suite the application. |

Exploitation Route | Already there are a variety of other research groups looking at applying ML to computer algebra. We will have another workshop at the 2020 ICMS and so will report more on this in the next reporting period. http://computing.coventry.ac.uk/~mengland/ICMS20-AIMS.html |

Sectors | Digital/Communication/Information Technologies (including Software),Education,Electronics,Pharmaceuticals and Medical Biotechnology,Transport |

URL | http://computing.coventry.ac.uk/~mengland/ML.html |

Description | Coventry University Full PhD Fees & Living Allowance Studentship |

Amount | £80,000 (GBP) |

Organisation | Coventry University |

Sector | Academic/University |

Country | United Kingdom |

Start | 05/2019 |

End | 12/2022 |

Title | New Method to Derive numeric features for machine learning from polynomial systems |

Description | In the paper below we describe a new automated approach to derive features (vectors of real numbers suitable for Machine Learning) from systems of polynomial equations. This greatly improves the quality of the ML models. D. Florescu and M. England. Algorithmically generating new algebraic features of polynomial systems for machine learning. Proceedings of the 4th Workshop on Satisfiability Checking and Symbolic Computation (SC2 '19), 12 pages. CEUR Workshop Proceedings 2460, 2019. |

Type Of Material | Improvements to research infrastructure |

Year Produced | 2019 |

Provided To Others? | No |

Impact | We intend to release openly the code to do this in the near future/ |

Title | Data, Software and Instructions supporting the CICM 2019 paper |

Description | This Zenodo dataset contains the test data, software and instructions on how the experiments in our 2019 CICM paper were run. |

Type Of Material | Database/Collection of data |

Year Produced | 2019 |

Provided To Others? | Yes |

Impact | It goes beyond the basic requirements of open data to make the experiments in the paper fully reproducible. |

URL | https://doi.org/10.5281/zenodo.2658626 |

Description | New CAD Computation Scheme |

Organisation | RWTH Aachen University |

Country | Germany |

Sector | Academic/University |

PI Contribution | Dr England visited RWTH Aachen in December 2018 to meet with Prof Erika Abraham (Aachen), her student (Gereon Kremer) and Prof James Davenport (Bath). The four of us discussed ideas for a new CAD computation scheme, ending the visit with pseudocode for a new algorithm. The algorithm is related to the present grant: it offers even greater scope for ML optimisation of heuristic choice as it is search based. The 4 of us worked collaboratively on this task. |

Collaborator Contribution | As noted above, the 4 of us worked collaboratively on this task. |

Impact | A first paper is under development. |

Start Year | 2018 |

Description | Grant Kick off Session |

Form Of Engagement Activity | Participation in an activity, workshop or similar |

Part Of Official Scheme? | No |

Geographic Reach | International |

Primary Audience | Professional Practitioners |

Results and Impact | The grant kick off session was help as a session of the 2018 International Congress on Mathematical Software. The session was organised by the PI and has 8 contributors (including the PI) and an audience of approximately 40. The aim of the session was to influence the developers of mathematical software ("professional practitioners" in the categorisation above) to consider whether, where and how best machine learning could be used to optimise their software. The PI invited talks from the project partners but this only made up half the program. The other 4 talks were offered by interested parties. A wide variety of mathematical software was represented (computer algebra, SAT/SMT, theorem provers, Math NLP). There was a lot of free-flowing debate withing and between talks. The session was one of the largest in the conference and also gained a substantial audience who were not affiliated with the speakers Since most of the talks were about potentials rather than results most talks did not come with a publication. However, the PI wrote a survey article which is what the DOI below is for. |

Year(s) Of Engagement Activity | 2018 |

URL | http://computing.coventry.ac.uk/~mengland/MLforMS-ICMS2018.html |

Description | RA gives Pints of Science Talk |

Form Of Engagement Activity | Participation in an activity, workshop or similar |

Part Of Official Scheme? | No |

Geographic Reach | Regional |

Primary Audience | Public/other audiences |

Results and Impact | The Postdoctoral Researcher who is employed by the grant, Dorian Florescu, gave a talk as part of a Pints of Science event in Leamington Spa: |

Year(s) Of Engagement Activity | 2019 |

URL | https://pintofscience.co.uk/event/wwwthebigbadwolf |