Structured Sparsity Methods in Machine Learning an Convex Optimisation

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

Over the past ten years theoretical developments in machine learning (ML) have had a significant impact in statistics, applied mathematics and other fields of scientific research. In particular, fruitful interactions between ML and numerical optimisation have emerged that are expected to lead to theoretical and algorithmic breakthroughs with the potential to render ML methodologies significantly more applicable to many problems of practical importance. The proposed project aims to make significant UK contributions at a crucial juncture in this emerging interdisciplinary field that has so far been dominated by the US and France. Many ML techniques can be cast as problems of minimising an objective function over a large set of parameters. Examples include support vector machines as well as more recent techniques for semi-supervised learning and multi-task learning. Often the objective function is convex. Consequently, ideas from convex optimisation are becoming increasingly important in the design, implementation and analysis of learning algorithms. Up to now, however, ML has almost exclusively resorted to off the shelf methods for convex optimisation, without substantially exploiting the rich theory which lies behind this field. A thesis of this proposal is that there is a need for a deeper interplay between ML and numerical optimisation. Ultimately, bridging the two communities will facilitate communication and the power of core optimisation will be more easily brought to bear in ML and lead to new frontiers in optimisation. An area in which the interplay between ML and optimisation has a particularly important role to play is in the use of sparsity inducing optimisation problems. A rationale that drives the use of sparsity-inducing models is the observation that when the number of model parameters is much larger than the number of observations, a sparse choice of parameters is strongly desirable for fast and accurate learning. Building on this success, we believe that the time is now right for the development of a new line of algorithms for matrix learning problems under structured sparsity constraints. This means that many of the components of the parameter matrix or a decomposition thereof are zero in locations that are related via some rule (e.g the matrix may be constrained to have many zero rows, many zero eigenvalues, to have sparse eigenvectors, etc.).Perhaps the most well-know examples in which structured sparsity has proven beneficial are in collaborative filtering, where the objective function is chosen to favour low rank matrices, and in multi-task learning where the objective function is chosen to favour few common relevant variables across different regression equations. These types of optimisation problems have only recently started to be addressed in ML and optimisation, and several fundamental problems remain open, most importantly the study of efficient algorithms which exploit the underlying sparsity assumptions and a statistical learning analysis of the methods.Our proposal is multidisciplinary and involves substantial exchange of ideas between Computer Science (Machine Learning) and Mathematics (Numerical Optimisation), with three main goals. Firstly, we aim to develop novel and efficient algorithms for learning large structured matrices; fast convergence of the algorithms should be guaranteed when applied to problem data that have a sparse solution. Secondly, in the cases where the assumed sparsity structure leads to NP-hard problems and the first goal is unachievable (this is often the case under low-rank assumptions), we aim to identify tractable convex relaxations and understand their impact on sparsity. Thirdly, we aim for models and algorithms that have a more natural interpretation than generic solvers (e.g., a minimax statistical justification), which should make it more likely that practitioners will embrace the new methodology.

Planned Impact

As data collection continues, the design of methods for ''learning from data'' may have a major impact on the way our economy evolves, and may aid societal development. The methods developed in the proposed project may prove to be a valuable approach to handle large amount of data in a more efficient manner. We expect these methods to have a significant impact in improving current systems in several areas outside the academic community, including both the industrial and public sectors. Some of our matrix learning methods may be of significant practical value in the retail industrial sector, where data of customers' preferences to products abound. For example, we expect that our work may be valuable to companies such as GlaxoSmithKline, Unilever, and Fortent, with whom UCL have established links. At the same time our methodology has the potential of being used by health organisations in a variety of prediction problems. We believe that, by appropriately engaging potential users and making them aware of the novelty of our methodology, the proposed project may have a noticeable impact outside the academic community within the next 5 years. This impact may be measured in terms of transfer of knowledge, improved technology and job creation. In the longer term (more than 10 years), the methods and principles developed in this project may influence both social organisations and companies to redevelop their data analysis software and products, which may be crucially important in improving quality of life, health care and creation of wealth. In order to increase the impact of the proposed project in the real world, we shall promote our research findings in different ways. These include: consultation meetings, workshops, an interactive website, and publications for the wider public. In particular, we plan a two-day workshop in Oxford (at the beginning of the second year of the project) where people from industry will be encouraged to participate. This will ensure that the potential users become aware of our methodology and will facilitate transfer of knowledge outside the academic environment. Furthermore, the Centre for Computational Statistics and Machine Learning (CSML) at UCL provides an excellent means through which to pursue further contacts with industry and promote the new methodology in the wider industrial sectors. As a further means of dissemination by direct training, we aim to develop and offer a graduate course on sparsity inducing optimisation and its applications via the Oxford Taught Course Centre (TCC). Lectures will be videotaped and made available to the wider public via podcasts on a special project website. The website will be designed to have both material that is easy to understand and appeals to the mathematically untrained public, as well as mathematically rigorous material aimed at university students with a mathematical or technical background and at colleagues in the field. The material aimed at the general public could motivate high-school pupils and young adults to undertake studies in computer science or applied mathematics. The principal investigators already have relevant prior expertise in achieving successful knowledge exchange. Furthermore, UCL and Oxford have professional experts in place who can provide feedback on building up communication skills and writing to the wider public. Moreover, if needed, the postdocs and students will undertake skill developments courses, which are freely available at both institutions.

Publications

10 25 50
publication icon
Baldassarre L (2012) Incorporating Additional Constraints in Sparse Estimation* in IFAC Proceedings Volumes

publication icon
Grünewälder S. (2012) Conditional mean embeddings as regressors in Proceedings of the 29th International Conference on Machine Learning, ICML 2012

publication icon
Lounici K (2011) Oracle inequalities and optimal inference under group sparsity in The Annals of Statistics

publication icon
Martínez-Rego D (2013) Similarity-Based Pattern Recognition

publication icon
Maurer A. (2013) Sparse coding for multitask and transfer learning in 30th International Conference on Machine Learning, ICML 2013

publication icon
Maurer A. (2012) Structured sparsity and generalization in Journal of Machine Learning Research

 
Description New machine learning methods for both supervised and unsupervised learning, which encourage a sparse solution of parameters. The methods have been studied both theoretically (using tools from convex optimisation, statistical learning theory and empirical processes) as well as practically, on problems ranging from affecting computing, to neuroimaging and to bioinformatics.

Among the results obtained, we highlight is the study of sharp error bounds on the statistical performance multitask learning methods based upon joint sparsity regularisation. Our analysis required new tools from computational learning theory and statistical estimation theory, some of which are of independent interest.
Exploitation Route Several researchers are working on sparsity regularization and sparse methods in general. The papers which came out from this project (as listed in the section "Publications") are well cited within machine learning and statistical communities. The machine learning algorithms studied in this project can have broad applications to other areas in science and engineering. Therefore, we believe the impact of this research will further grow in the next years.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare,Transport

URL http://www0.cs.ucl.ac.uk/staff/M.Pontil/pubs.html
 
Description Collaboration with Prof. Alexandr Tsybakov (Université Paris 6 Pierre et Marie Curie) 
Organisation Pierre and Marie Curie University - Paris 6
Country France 
Sector Academic/University 
PI Contribution Prof. Tsybakov is a world leading expert on mathematical statistics. He helped with the study of statistical properties of certain multitask learning and structured sparsity methods.
Collaborator Contribution Please see above
Impact Please see the section "Publications".
Start Year 2008
 
Description Collaboration with Prof. Charles Micchelli, University at Albany SUNY 
Organisation University at Buffalo
Country United States 
Sector Academic/University 
PI Contribution Prof. Micchelli is an expert in approximation theory and convex analysis
Collaborator Contribution Prof. Micchelli provided key insight into issues pertaining the study of convex learning algorithm, the study of so-called representer theorem and the analysis of certain regularisers which arise in structured sparsity learning problems and in multitask learning.
Impact This is multidisciplinary collaboration across Computer Science and Applied Mathematics Please see the section "Publications" for specific research outputs.
 
Description University of Genoa 
Organisation University of Genoa
Country Italy 
Sector Academic/University 
PI Contribution We have written two European Projects (one under submission, the other unfortunately unsuccessful) which relates to machine learning in general and sparsity methods in particular.
Collaborator Contribution Prof. Piana provides expertise on the theory of inverse problems and has real-data where the methods developed in this project could be applied.
Impact Submitted a EU proposal.
Start Year 2012