Least Squares: Fit for the Future

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Scientific Computing Department

Abstract

This project seeks to develop new algorithms, supporting theory and software for solving least squares problems that arise in science, engineering, planning and economics. Least squares involves finding an approximate solution of overdetermined or inexactly specified systems of equations. Real-life applications abound. Weather forecasters want to produce more accurate forecasts; climatologists want a better understanding of climate change; medics want to produce more accurate images in real time; financiers want to analyse and quantify the systematic risk of an investment by fitting a capital asset pricing model to observed financial data. Finding the 'best' solution commonly involves constructing a mathematical model to describe the problem and then fitting this model to observed data. Such models are usually complicated; models with millions of variables and restrictions are not uncommon, but neither are relatively small but fiendishly difficult ones. It is therefore imperative to implement the model on a computer and to use computer algorithms for solving it. The latter task is at the core of the proposed activities.

Nearly all such large-scale problems exhibit an underlying mathematical structure such as sparsity. That is to say, the interactions between the parameters of a large system are often localised and do not involve any direct interaction between all the components. To solve the systems and models represented in this way efficiently involves developing algorithms that are able to exploit these underlying 'simpler' structures, which often reduces the scale of the problems, and thus speeds up their solution. This enterprise commonly leads not only to new software that implements existing methods, but to the creation of new theoretical and practical algorithms. At the other extreme, some problems involve interaction between all components, and while the underlying structure is less transparent, it is nonetheless present. In these cases, the computational burden may be very high and such problems may generally only be solved by sophisticated use of massively parallel computers.

The methods we will develop will aim to solve the given problem efficiently and robustly. Since computers cannot solve most mathematical problems exactly, only approximately, a priority will be to ensure the solution obtained by applying our algorithms is highly accurate, that is, close to the 'true' solution of the problem. But it is also vital that we solve problems fast without sacrificing accuracy; this is particularly true if a simulation requires us to investigate a large number of different scenarios, or if the problem we seek to solve is simply a component in an overall vastly more complicated computation, or if new data arrives in real time and we need to adapt the model accordingly. Developing algorithms that are both fast and accurate on multicore machines presents a key challenge.

We aim to improve upon existing algorithms from several different angles, exploiting new mathematical techniques from areas such as optimization and the solution of partial differential equations. Parallelism will be designed into our new algorithms, allowing modern computer hardware to be exploited. These generic improvements will be coupled with the development of new techniques to exploit special features of problems from important application areas, including X-ray microscopy, crystallography and radiative transfer modelling.

Our new software will be made available through the internationally renowned mathematical software libraries HSL, GALAHAD and SPRAL. These are extensively used by the scientific and engineering research community in the UK and abroad, as well as by some commercial companies. Since 2010, more than 50 UK university departments have used HSL for teaching or research in a wide range of disciplines that includes computational chemistry, engineering design, fluid dynamics, portfolio optimization, and circuit theory.

Planned Impact

The method of least squares is a commonly-used approach to finding an appropriate solution of overdetermined or inexactly specified systems of equations. Since its development in the eighteenth century, the solution of least squares problems has been and continues to be a fundamental method in scientific data fitting. As reported in the headline article in a recent edition of Nature (514, 29 October 2014), Marquardt's SIAM 1963 paper (SIAM. J. App. Math, 11(2)) on methods for nonlinear-least squares is one of the one hundred most-cited papers of all time, which is an indicator of the importance of the area. Least squares solvers are used across a wide spectrum of disciplines in both academia and industry, for everything from simple curve fitting, through the estimation of satellite image sensor characteristics, data assimilation for weather forecasting and for climate modelling, to powering internet mapping services, exploration seismology, NMR spectroscopy, medical imaging, aerospace systems, and neural networks. Thus the range and number of potential beneficiaries is vast, including not only the computational scientists and engineers engaged in these areas but also the people and industries that exploit them. Ultimately this includes those in government involved in making policy decisions on areas such as climate change, as well as the public through, for example, better weather forecasts and advances in medical imaging. An integral part of this project will be ensuring our work has a significant impact on a wide range of application areas.

Among this wide variety of practical applications, important guidance (in the form of test data and interaction with users) will initially come from our local collaborators (Diamond Light Source, STFC-ISIS and RAL Space) and will be in areas such as X-ray microscopy, crystallography, the analysis of neutron scattering and muon spectroscopy data, and radiative transfer modelling.

A key means of guaranteeing impact will be the development and distribution of software. Indeed, much of the planned research will lead to the design and development of high quality open source mathematical software that implements our new algorithms. This software will be included within the Group's internationally renowned and well established mathematical software libraries HSL, GALAHAD and/or SPRAL. The software will be fully supported and maintained by the Group. It will be primarily written using standard Fortran for portability and efficiency but by providing interfaces for other languages, notably MATLAB and C, we will further widen accessibility and the potential user base.

The theoretical outcomes as well as the software will be promoted through journal publications, technical reports, conference presentations, web pages, and through an international workshop that the Group will organise during the second year of the project and that will involve application users.

Publications

10 25 50
 
Description This grant has enabled us to develop new theory, algorithms and software for solving least squares problems, both nonlinear and linear problems. Such problems arise in a wide range of practical applications; they can be very challenging to solve. Thus our interest has been in developing new algorithms to solve problems either more robustly or more rapidly, as well as extending the classes of problems that can be effectively tackled.
The results of our research are available in articles that have been published in leading journals in the field of computational mathematics (in particular, numerical linear algebra and optimisation journals) and also as mathematical software. This software has been made available (see outcomes below); it is fully documented and maintained as part of our well-established and internationally renowned software libraries.
Exploitation Route The software that was developed as part of this grant has been taken forward and is being used within a number of different software projects/libraries:

- Our nonlinear least squares methods are being used to analyse data from the ISIS neutron and muon source, via the Mantid project https://www.mantidproject.org/. The
The Mantid project provides a framework that supports high-performance computing and visualisation of materials science data.

- NAG Ltd https://www.nag.com/ has released our open source software, RALFit, as the package e04gg, which they find to be significantly faster and more robust than their
previous solver, e04us.

- Our incomplete factorization preconditioner for use with iterative methods for solving large-scale linear least squares problems, HSL_MI35, is available as part of the HSL mathematical software library https://www.hsl.rl.ac.uk/catalogue/hsl_mi35.html and has already been used for a number of different applications including:
- Robot arm optimal control
- Spacecraft trajectory optimization
- Working with Ocean Wave 3d
- Geophysical Inversion

- Our general purpose linear least squares solver is about to be released within HSL. This can solve large-scale systems, both sparse systems and tough problems that are mainly sparse but contain some dense rows. As our HSL sparse solvers are well known and widely used, we expect that this new solver, HSL_MA85, will attract new and existing users of HSL.

- Our software NLS (nonlinear sparse least squares) and BLLS (bound-constrained sparse linear least-squares) have been included in the GALAHAD nonlinear optimization
software library https://www.galahad.rl.ac.uk/
Sectors Aerospace, Defence and Marine,Electronics,Environment

 
Description Our research has lead to the development of new library-quality software that has been downloaded many times. Unfortunately, we are unable to provide precise download details (for our open source software, data is not collected from those who download the software). We would expect that many of the users are academics but as our software libraries are well known and well used, some of the users will be non academics. In addition, one of our least squares software packages that was developed as part of this project, called RALFit, has been incorporated into the NAG library. NAG is a commercial (not for profit) company, with many users/customers not only from academia but also from industry and other organisations worldwide, so this will potentially have a widespread and long term non-academic impact.
First Year Of Impact 2018
Sector Aerospace, Defence and Marine,Electronics,Other
Impact Types Economic

 
Description Collaboration with NAG on nonlinear least-squares solvers 
Organisation Numerical Algorithms Group Ltd
Country United Kingdom 
Sector Private 
PI Contribution The RALFit software (https://github.com/ralna/ralfit) was developed by Tyrone Rees during this grant, and this is planned to be incorporated in the NAG library
Collaborator Contribution NAG are working alongside us to add further features to the software. In particular, they have developed logic for solving nonlinear least-squares problems with box constraints.
Impact RALFit software
Start Year 2018
 
Description Least squares for Diamond 
Organisation Diamond Light Source
Country United Kingdom 
Sector Private 
PI Contribution We have tested new least squares software that we have been developing under the grant on problem data provided by Diamond. Results have been shared with Diamond.
Collaborator Contribution Diamond has provided data and has been involved in a number of project meetings to plan possible future work.
Impact Outputs are still preliminary.
Start Year 2016
 
Description Least squares for Mantid 
Organisation Science and Technologies Facilities Council (STFC)
Department ISIS Neutron and Muon Source
Country United Kingdom 
Sector Academic/University 
PI Contribution The team has been working on improving the least squares solvers used by the Mantid package https://www.mantidproject.org/Main_Page We have developed new software that has been integrated into Mantid. We have also performed performance testing on new and old algorithms.
Collaborator Contribution The Mantid team provided us with training in using Mantid and with test problems. Furthermore, they have contributed to fortnightly project progress review meetings. In addition, they have provided direct funding to the team (so far, equivalent to approximately 1 FTE).
Impact New software for GALAHAD, a code package RALFit and new test problems.
Start Year 2015
 
Description Research collaboration on evaluation complexity issues for the solution of nonlinear least-squares problems 
Organisation University of Namur
Country Belgium 
Sector Academic/University 
PI Contribution This is a collaboration involving Professor Coralia Cartis (Oxford) and Professor Philippe Toint (Namur). The aim is to understand the worst-case function evaluation complexity of state-of-the-art methods for solving nonlinear least-squares problems. Our research has found (i) methods that improve on current best-known estimates in the general (non-zero residual) case and (ii) achieve very fast convergence when the optimal residuals are zero.
Collaborator Contribution This has been a long-term collaboration stretching back thirty years and resulting in over sixty joint publications (with the senior author, Toint) and ten years (and twenty publications with the junior one, Cartis).
Impact See publication and software lists
 
Description Research collaboration on evaluation complexity issues for the solution of nonlinear least-squares problems 
Organisation University of Oxford
Department Department of Oncology
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a collaboration involving Professor Coralia Cartis (Oxford) and Professor Philippe Toint (Namur). The aim is to understand the worst-case function evaluation complexity of state-of-the-art methods for solving nonlinear least-squares problems. Our research has found (i) methods that improve on current best-known estimates in the general (non-zero residual) case and (ii) achieve very fast convergence when the optimal residuals are zero.
Collaborator Contribution This has been a long-term collaboration stretching back thirty years and resulting in over sixty joint publications (with the senior author, Toint) and ten years (and twenty publications with the junior one, Cartis).
Impact See publication and software lists
 
Title BLLS - a bound-constarined linear least-squares solver in GALAHAD 
Description BLLS is an open-source modern fortran bound-constarined linear least-squares solver that is available as part of GALAHAD 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact New methods for simple-bound projection 
 
Title FitBenchmarking 
Description FitBenchmarking is an open source tool for comparing different minimizers/fitting frameworks based on their accuracy and runtimes. We make it easy to for scientists to add problems from our large scale facilities, and also to add new nonlinear test squares solvers to test. FitBenchmarking is written in object oriented python, and is designed to be cross-platform, running on Windows, Linux and Mac OS. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact We have been able to engage with scientists working at ISIS (who use the Mantid software) and the Diamond Light Source/ISIS (who use SasView) to help us understand the nature of the nonlinear least-squares problems they face, and share with them the benefits of using modern methods (such as those developed in this grant). 
URL http://www.fitbenchmarking.com
 
Title FitBenchmarking: an open source Python package comparing data fitting software 
Description What's Changed FitBenchmarking v0.2.0 is the result of 10 months of work, and we've introduced a lot of changes to the FitBenchmarking package. This release now requires Python 3.7.1 or greater. New Features Support for different cost functions was added; see below Improved support for different jacobians; see below Better handling of conflicting options More descriptions of the methods in docs Box constraints supported Licences of the software used documented Minimizers are collected into types Second derivatives (Hessians) supported, and can be compared Added an option to select algorithms by type Results directory can be set from the command line A maximum runtime can be selected for each solver Parsers Mantid's Crystal Field objective is supported Native parser has been refactored IVP parser has been added Fitting Software iminuit>2.0 is supported levmar supported Matlab supported (basic, and curve_fitting, statistics and optimization toolboxes) horace supported (Matlab package) scipy_go supported gradient_free_optimizers supported bumps support has been updated to version 0.9.0 Cost functions NLLS and WeightedNLLSs cost functions replace the defaults HellingerNLLS cost function Poisson cost function added Multiple cost functions can be compared at once Jacobians numdifftools Jacobians have been added Analytic Jacobians added for NIST and SIF file formats Solver default Jacobians have been added Multiple Jacobians can be compared at once Tables the generation of these has been refactored Failed fits are highlighted in the tables Customizable colourmaps in the tables Header row and leading column of results table is frozen Problem descriptions displayed in fitting report Accuracy and runtime displayed in fitting report Problem summary page has been added Data sets more SASView examples added Data Assimilation (IVP) examples added CrystalField examples added Global optimization SIF file examples added Breaking Changes Support is dropped for iminuit<2.0 lm-scipy-no-jac option under scipy_ls in [MINIMIZERS] has been removed (this is now available under the solver default jacobian) results_dir has moved from [PLOTTING] into [OUTPUT] in the options file use_errors has been removed as an option, and this functionality is controlled by the choice of cost function The Jacobian method SciPyFD is no longer supported; this has been renamed to scipy colour_scale option in [PLOTTING] is no longer supported; this is replaced by the colour_map bumps minimizers name changes: lm-bumps -> scipy-leastsq and mp -> lm-bumps Full Changelog: https://github.com/fitbenchmarking/fitbenchmarking/compare/v0.1.5...v0.2.0 
Type Of Technology Software 
Year Produced 2022 
Impact Has notably decreased the time to apply innovative algorithms to data fitting applications in the ISIS Neutron and Muon Source 
URL https://zenodo.org/record/4738576
 
Title HSL_MA85: Sparse diagonally-weighted linear least squares solver 
Description HSL_MA85 uses a direct method to solve large-scale diagonally-weighted linear least squares problems. The package is written in Fortran and is part of the HSL Mathematical Software Library 
Type Of Technology Software 
Year Produced 2020 
Impact Not yet known 
 
Title Incomplete factorization preconditioner for solving large sparse linear least squares problems 
Description HSL_MI35 computes an incomplete factorization preconditioner that may be used in the solution of weighted sparse linear least squares problems. 
Type Of Technology Software 
Year Produced 2016 
Impact The software is available within the HSL mathematical software library. It is freely available to academics for teaching and research purposes. A paper has been written and published (january 2017) that demonstrates the effectiveness of the software compared with state-of-the-art alternatives. It is too early to know what the take up of the software will be (9 downloads as of February 2017). Version 1.2.0 released September 2016. 
URL http://www.hsl.rl.ac.uk/catalogue/hsl_mi35.html
 
Title Nonlinear least-squares examples with CUTEst 
Description CUTEst is a collection of artificial and real-life optimization test examples along with interfaces to many commonly-available optimization solvers. Recently, a large number of nonlinear least-squares examples from a variety of sources have been added. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact These test examples have informed improvements to both of our nonlinear least-squares software packages (RALFit and NLS), ad consequently improved the robustness of Mantid's fitting capabilities. 
URL https://ccpforge.cse.rl.ac.uk/gf/project/cutest/wiki/
 
Title Package NLS from GALAHAD 
Description NLS is a Fortran package for solving large-scale nonlinear least-squares problems. The package provides many standard and new options, and there is an interface to the CUTEst test-problem suite. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact This has acted as a prototype for RALFit, our least-squares package that has been incorporated into Mantid 
URL https://ccpforge.cse.rl.ac.uk/gf/project/galahad/scmsvn/?action=browse&path=%2Ftrunk%2Fdoc%2Fnls.pdf
 
Title RALFit: A non-linear least squares solver 
Description RALFit is a software package for solving nonlinear least-squares problems. It is written in modern Fortran, and has C and Python interfaces. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact It has been incorporated into the Mantid data analysis package 
URL https://github.com/ralna/ralfit