Accelerated Coordinate Descent Methods for Big Data Problems
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Mathematics
Abstract
Much of modern society and economy, in the United Kingdom and elsewhere, is moving in the direction of digitization and computation. Humankind is now able to collect and store enormous quantities of digital data coming from sources such as health records (e.g., IBM ``Watson'' project, MRI/CT scans), government databases (e.g., e-Government, GORS: government operational research service), social networks (e.g., Facebook, Linked-IN, delicious), online news (e.g., New York Times article database), corporate databases (e.g., bank records, Amazon.com) and the internet. Global society is, as a consequence, facing many unprecedented challenges and opportunities. One of the biggest of these has to do with the ability (or rather, lack thereof) to distill, understand and utilize in an optimal way the information contained within these gigantic data sources. The main technology for this is to "form an optimization problem'' and then solve it using a well-chosen optimization algorithm in a suitable computing environment (e.g., a multicore workstation, GPU-enabled machine, cloud).
In this project we aim to contribute to a breakthrough in our ability to solve optimization problems arising from big data domains via developing, analyzing and implementing new accelerated parallel coordinate descent (CD) methods. Since in big data problems the data is typically highly structured, well-designed CD methods can have very low memory requirements and arithmetic cost per iteration---often much smaller than the dimension of the problem. This is in sharp contrast with standard methods whose arithmetic complexity of a single iteration depends on the dimension at least quadratically.
Our research objectives are:
1. Acceleration Theory. We will analyze the iteration complexity (i.e., give bounds on the number of iterations/steps needed to achieve a prescribed level of accuracy) of new parallel coordinate descent methods accelerated using the following 4 strategies: a) nonuniformity (of the frequency with which individual coordinates are updated), b) asynchronicity (of updates and computation), c) distribution (of data and computation to nodes of a cluster) and d) inexactness (of certain operations and computations the algorithm depends on).
2. Stochastic Gradient Descent. We will analyze theoretically and test numerically the relationship between parallel coordinate descent (CD) methods and parallel stochastic gradient descent (SGD) methods.
3. ACDC Code. We will implement the accelerated algorithms in a code which we will make publicly available.
In this project we aim to contribute to a breakthrough in our ability to solve optimization problems arising from big data domains via developing, analyzing and implementing new accelerated parallel coordinate descent (CD) methods. Since in big data problems the data is typically highly structured, well-designed CD methods can have very low memory requirements and arithmetic cost per iteration---often much smaller than the dimension of the problem. This is in sharp contrast with standard methods whose arithmetic complexity of a single iteration depends on the dimension at least quadratically.
Our research objectives are:
1. Acceleration Theory. We will analyze the iteration complexity (i.e., give bounds on the number of iterations/steps needed to achieve a prescribed level of accuracy) of new parallel coordinate descent methods accelerated using the following 4 strategies: a) nonuniformity (of the frequency with which individual coordinates are updated), b) asynchronicity (of updates and computation), c) distribution (of data and computation to nodes of a cluster) and d) inexactness (of certain operations and computations the algorithm depends on).
2. Stochastic Gradient Descent. We will analyze theoretically and test numerically the relationship between parallel coordinate descent (CD) methods and parallel stochastic gradient descent (SGD) methods.
3. ACDC Code. We will implement the accelerated algorithms in a code which we will make publicly available.
Planned Impact
The pathways to impact document details the ways in which we plan to achieve non-academic impact. The Case for Support details the academic impact.
Here we offer a brief summary:
Academic Impact:
- close research communities: optimization, operational research, numerical analysis, computer science, machine learning, compressed sensing, high performance computing, ITC
- more distant research communities: biology, astronomy, civil engineering, topology design, signal processing
- individuals: Prof Nesterov (Louvain), Prof Srebro (Chicago), Martin Takac (Edinburgh), PDRA, Prof Gondzio (Edinburgh), Prof Recht (Madison), Prof Wright (Madison), Dr Forgan (Astronomy)
Non-academic Impact:
- ACDC code (publicly available efficient code based on the algorithms developed in Module 1)
- SAS Institute (Dr Polik, Dr Chari, Dr Griffin)
- Western General Hospital (Dr Green)
- Arup (Mr Simpson)
Pathways:
- writing papers in top journals
- presenting at top conferences and workshops
- website with code at code.google.com + links to it from various places
- meetings with and feedback from SAS Institute at conferences (at least 1-2 x per year) and via teleconferencing facilities
- visit to SAS Headquarters in Cary, North Carolina
- impact workshop (to be held in May 2015)
- supervision of 2 MSc dissertations (Summer 2013 and Summer 2014)
Since structured convex optimization, randomized algorithms and data mining are important and fast growing fields, this proposal is timely for establishing and growing an internationally competitive research group in this area based in the UK.
Quote from EPSRC website: "For UK research to have maximum impact, our researchers need to work with the very best on the global stage." We strongly believe that, given the quality of the network of researchers involved and, in particular, of the visiting researchers, this proposal falls into this category.
Here we offer a brief summary:
Academic Impact:
- close research communities: optimization, operational research, numerical analysis, computer science, machine learning, compressed sensing, high performance computing, ITC
- more distant research communities: biology, astronomy, civil engineering, topology design, signal processing
- individuals: Prof Nesterov (Louvain), Prof Srebro (Chicago), Martin Takac (Edinburgh), PDRA, Prof Gondzio (Edinburgh), Prof Recht (Madison), Prof Wright (Madison), Dr Forgan (Astronomy)
Non-academic Impact:
- ACDC code (publicly available efficient code based on the algorithms developed in Module 1)
- SAS Institute (Dr Polik, Dr Chari, Dr Griffin)
- Western General Hospital (Dr Green)
- Arup (Mr Simpson)
Pathways:
- writing papers in top journals
- presenting at top conferences and workshops
- website with code at code.google.com + links to it from various places
- meetings with and feedback from SAS Institute at conferences (at least 1-2 x per year) and via teleconferencing facilities
- visit to SAS Headquarters in Cary, North Carolina
- impact workshop (to be held in May 2015)
- supervision of 2 MSc dissertations (Summer 2013 and Summer 2014)
Since structured convex optimization, randomized algorithms and data mining are important and fast growing fields, this proposal is timely for establishing and growing an internationally competitive research group in this area based in the UK.
Quote from EPSRC website: "For UK research to have maximum impact, our researchers need to work with the very best on the global stage." We strongly believe that, given the quality of the network of researchers involved and, in particular, of the visiting researchers, this proposal falls into this category.
People |
ORCID iD |
Peter Richtarik (Principal Investigator) |
Publications
Allen-Zhu Zeyuan
(2015)
Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling
in arXiv e-prints
Chambolle A
(2018)
Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications
in SIAM Journal on Optimization
Csiba D.
(2018)
Importance sampling for minibatches
in Journal of Machine Learning Research
Csiba Dominik
(2015)
Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses
in arXiv e-prints
Fercoq O
(2013)
Accelerated, Parallel and Proximal Coordinate Descent
Fercoq O
(2015)
Accelerated, Parallel, and Proximal Coordinate Descent
in SIAM Journal on Optimization
Fercoq O
(2019)
Adaptive restart of accelerated gradient methods under local quadratic growth condition
in IMA Journal of Numerical Analysis
Fercoq O
(2016)
Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent
in SIAM Review
Description | Many new algorithms for big data optimization. |
Exploitation Route | The algorithms I developed underpin modern information based society. They are of fundamental nature, and applicable in virtually all domains of data science. |
Sectors | Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Construction,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Manufacturing, including Industrial Biotechology,Transport |
URL | http://www.maths.ed.ac.uk/~prichtar/i_papers.html |
Description | Several of my algorithms are now adopted by industry. |
First Year Of Impact | 2013 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Societal,Economic |
Description | EPSRC Early Career Fellowship |
Amount | £823,211 (GBP) |
Funding ID | EP/N005538/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2016 |
End | 12/2020 |
Title | CoCoA |
Description | A framework for communication-efficient distributed optimization for machine learning. |
Type Of Technology | Software |
Year Produced | 2015 |
Impact | The CoCoA [NIPS 2014] / CoCoA+ [ICML 2015] distributed optimization algorithm developed in a duo of papers with two co-authors from Edinburgh (Martin Takác, myself) has won the MLconf Industry Impact Student Research Award. The award goes to our coauthor Virginia Smith (UC Berkeley). Other co-authors: M. Jaggi (ETH Zurich), M.I. Jordan (Berkeley), C. Ma (Lehigh), J. Terhorst (UC Berkeley), S. Krishnan (UC Berkeley), T. Hofmann (ETH Zurich). About the award: "This year, we started a new award program called the MLconf Industry Impact Student Research Award, which is sponsored by Google. This fall, our committee of distinguished ML professionals reviewed several nominations sent in from members of the MLconf community. There were several great researchers that were nominated and the committee arrived at awarding 2 students whose work, they believe, has the potential to disrupt the industry in the future. The two winners that were announced at MLconf SF 2015 are UC Irvine Student, Furong Huang and UC Berkeley Student, Virginia Smith. Below are summaries of their research. We've invited both researchers to present their work at upcoming MLconf events." The citation: " Virginia Smith's research focuses on distributed optimization for large-scale machine learning. The main challenge in many large-scale machine learning tasks is to solve an optimization objective involving data that is distributed across multiple machines. In this setting, optimization methods that work well on single machines must be re-designed to leverage parallel computation while reducing communication costs. This requires developing new distributed optimization methods with both competitive practical performance and strong theoretical convergence guarantees. Virginia's work aims to determine policies for distributed computation that meet these requirements, in particular through the development of a novel primal-dual framework, CoCoA, which is written on Spark. The theoretical and practical development of CoCoA is an important step for future data scientists hoping to deploy efficient large-scale machine learning algorithms." |
URL | https://github.com/gingsmith/cocoa |
Description | Talks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | talks at workshops, seminars, conferences and so on; more than 15 a year |
Year(s) Of Engagement Activity | 2015,2016,2017 |
Description | youtube talks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Research talks posted on youtube. Sparked interest in the developed methods in various audiences. |
Year(s) Of Engagement Activity | 2013,2014,2015 |
URL | https://www.youtube.com/watch?v=0sHOfqhCZw0 |