Mathematics for Vast Digital Resources

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Mathematics

Abstract

British society, economy and government are becoming digital at staggering speeds. Individuals use the Internet to shop for books, houses and employment, search for information, share resources and group into social networks. Government wants to provide better and faster services to citizens and companies, cut costs and identify tax evaders. Much of business has moved online in the form of digital stores, service portals and targeted advertisements, breaking the physical barrier of distance between parties interested in a particular kind of interchange. Entering a future digital era provides many opportunities which, if seized properly, have a strong potential to boost economic growth and improve the quality of life. Researchers are facing the challenge of developing the tools necessary to make the best use of these opportunities. In this project we will develop new methodologies addressing the challenges and utilizing the opportunities arising from increasing vastness -in size and accessibility -of digital resources . We will analyze mathematical properties of these problems, design novel techniques to exploit their structure, implement them into efficient algorithms, and collaborate with industrial partners and digital economy hubs to ensure impact. Size: Technological breakthroughs in mankind's ability to produce and store huge amounts of data create an unprecedented challenge: a new science is needed for organizing, optimizing and interpreting data coming from new sources like the Internet, commercial databases, scientific experiments and government records. Hospitals, research labs, transportation companies, retailers and businesses produce more raw data than current technology is able to utilize effectively. Moreover, it seems that this trend will continue at an exponential rate. For the problems in this category we will develop new ground-breaking operational research techniques requiring us to reach the depths of several disciplines, merging insights from numerical optimization, machine learning and software development.Accessibility: Due to the vast accessibility of digital resources, portals connecting suppliers of a certain service with potential customers are becoming extremely popular. There are websites specializing in employment (Jobs.ac.uk), housing (Lettingweb.com), as well as contact points facilitating general exchange (Gumtree.com, Craigslist.org). The need to manage the customer portfolios of these portals for optimal user experience uncovers many fundamental mathematical challenges. Since the existing literature does not address these new problems appropriately, a careful study of these systems has the potential to improve user experience substantially. We will construct and analyze mathematical models of such systems using techniques at the interface of modern queueing theory and optimization.In summary, we will develop new operational research techniques which: (i) are capable of dealing with the unprecedented scale of modern digital resources, and (ii) will upgrade the access management to these new resources. Our goal is to gain new mathematical insights into the underlying problems in digital economy and provide the industry and the society with new tools to address these problems appropriately to meet public's expectations over the next decade.

Planned Impact

Due to obvious advantages, the Internet has become a natural new medium for doing business and communicating with customers. This new digital market environment facilitates competitiveness in an unprecedented way, making it essential for companies to be able to: (i) fully utilize its hidden resources, and (ii) increase the quality of service provided. In the first line of research, we will develop new optimization-based methodologies which address the management and analysis of vast digital resources. The Internet has already become an overwhelming source of information and the speed of accumulating new information still seems to increase. Government agencies, retailers, businesses have their huge databases of information but are not always able to make good use of them. Hence, our goal in this proposal is to develop new methods that will enable organizations to handle huge amounts of data efficiently, which means it will be possible to extract more useful information from existing data to address the needs and preferences of people, using the techniques developed in this research. In the second line of research, we will analyze different ways to manage the accessibility of systems where users impose positive externalities on each other. These systems are commonly observed in portals operating over the Internet, as the Internet serves as a common access point for customers and suppliers. The techniques developed in this research will be aimed to improve the quality of service offered by portals which is used by significant portion of the society. As stated in the attached letters of support, several major companies realize the challenge and are keen on getting involved in the research suggested in this proposal. These companies will be in regular contact with our team throughout the project, help us define relevant problems, provide us with real-world data related to their problems, and assist us in the analysis of results obtained with our new techniques. Hence, these companies will be obvious first beneficiaries of our research. In a more general setting, our research will be beneficial to organizations which need to manipulate vast amount of data or price the access to services, such as hospitals, government agencies as well as most businesses. To facilitate the dissemination of the outcomes of the project beyond our immediate industry collaborators, we plan to organize two workshops. The first workshop will take place in the second year and will address Numerical Optimization Techniques on Vast Datasets . The second workshop will address Managing the Accessibility of Digital Resources and will be held in the third year of the project. We plan to invite guests from industry and academia to attend these workshops and we hope to achieve two objectives: (i) spread the word about novel techniques to industry and public sector, (ii) help the academics (and ourselves) to learn more about real industry needs. Working in close collaboration with industry will also benefit the project team and help the team learn new professional skills. This aspect of the project will also be an added value in the career development of the two PDRAs. Through our collaborations with industrial partners and workshops organized as a part of our research, we believe that School of Mathematics at Edinburgh University will discover new ways for facilitating knowledge exchange. We are also planning to deliver lectures to secondary school students on the practical aspects of our research and encourage them to think about how mathematics can help to improve the quality of life.
 
Description Scientific Programme

§3 DEALING WITH SIZE

§3.3 Matrix-Free Interior Point Methods for Huge Scale Optimization

Papers [1-4] and [S2,S5] are relevant to $3.3.
A matrix-free interior point method has been developed and demonstrated
to solve a number of problems which are untractable by any alternative
optimization technique [2-4]. A specialized variant of the method
for compressed sensing problems [1] has been demonstrated to deliver
the reliability which cannot be matched by any state-of-the-art
first order method. A more general framework to employ the second
order information when solving large signal/image processing problems
or machine learning problems [S2,S5] has been developed.

§3.4 Accelerated Gradient Methods for Huge Scale Optimization

§3.5 Sparsity Inducing Optimization

Papers [C1,C2,6-7] and [S3,S4,S6-S14] are relevant to $3.4 + $3.5.
We have gone well beyond what we promised in 3.4 and 3.5.
Developed the theory of coordinate descent as randomized decomposition
for dealing with big data. The theory and computational results are
considered state of the art (e.g., [5] now has 100+ citations, [S8] has
50+ citations).

§4 PROGRAMME: DEALING WITH ACCESSIBILITY

We have introduced a new model called "probabilistic matching systems"
which are useful in modelling accessibility problems in web-based
portals such as employment systems, matrimonial and dating sites.
As promised, we first studied the stability of these systems and
paper and developed policies to ensure user satisfaction.
We also proposed 2 different approximating models to analyze
complex questions about these systems under heavy traffic.
We also worked on pricing of probabilistic systems to maximise
the revenue generated. We first employed optimal control methods
to derive an asymptotically optimal pricing policies.
Then, we derived pricing policies based on accepting or rejecting
the users using an exact analysis. We are currently preparing
two papers for submission on these results.

Papers Published

[1] K. Fountoulakis, J. Gondzio, P. Zhlobich.
Matrix-free interior point method for compressed sensing problems,
Mathematical Programming Computation
March 2014, Volume 6, Issue 1, pp 1-31
DOI: 10.1007/s12532-013-0063-6

[2] J. Gondzio.
Interior point methods 25 years later,
European Journal of Operational Research,
vol 218 (2012) 587-601.
DOI: 10.1016/j.ejor.2011.09.017

[3] J. Gondzio.
Convergence analysis of an inexact feasible interior point method
for Convex Quadratic Programming,
SIAM Journal on Optimization 2013, Vol. 23, No. 3, pp. 1510-1527
DOI: http://dx.doi.org/10.1137/120886017

[4] J. Gondzio, J. Gruca, J.A.J. Hall, W. Laskowski, M. Zukowski.
Solving large-scale optimization problems related to Bell's theorem,
Journal of Computational and Applied Mathematics 263C (2014), pp. 392-404
DOI: http://dx.doi.org/10.1016/j.cam.2013.12.003

[5] P. Richtarik and M. Takac.
Iteration complexity of randomized block-coordinate descent methods
for minimizing a composite function,
Mathematical Programming 144(2):1-38, 2014

[6] M. Takac, A. Bijral, P. Richtarik and N. Srebro.
Mini-batch primal and dual methods for SVMs,
Journal of Machine Learning Research W&CP 28(3):1022-1030, 2013

[7] R. Tappenden, P. Richtarik and B. Buke.
Separable approximations and decomposition methods for the augmented Lagrangian,
To Appear in: Optimization Methods and Software,
DOI:http://dx.doi.org/10.1080/10556788.2014.966824.

[8] B. Buke, J. C. Smith and S. A. Thomas.
On a random walk problem with arc failures and memory,
Networks. Volume 66, Issue 1, August 2015, Pages 67-86
DOI: 10.1002/net.21608

[9] I. Dassios, K. Fountoulakis and J. Gondzio
A Preconditioner for a Primal-Dual Newton Conjugate Gradients
Method for Compressed Sensing Problems
SIAM Journal on Scientific Computing 37 (2015) A2783--A2812.

[10] K. Fountoulakis and J. Gondzio.
A Second-Order Method for Strongly Convex L1-Regularization Problems,
L1-Regularization Problems,
Mathematical Programming 156 (2016), pp. 189--219.
DOI: 10.1007/s10107-015-0875-4

[11] R. Tappenden, P. Richtarik and J. Gondzio.
Inexact coordinate descent: complexity and preconditioning,
Accepted for publication in
Journal of Optimization and its Applications.
Published online: 29 February 2016.
DOI 10.1007/s10957-016-0867-4
http://link.springer.com/article/10.1007/s10957-016-0867-4

[12] B. Buke and H. Chen.
Stabilizing Policies for Probabilistic Matching Systems,
Queueing Systems Vol 80 (2015) 35--69.
DOI 10.1007/s11134-015-9433-2

[13] O. Fercoq and P. Richtarik.
Accelerated, parallel and proximal coordinate descent,
SIAM J. Optim., 25(4), 1997-2023. (27 pages)
DOI:10.1137/130949993

[14] P. Richtarik and M. Takac.
Parallel coordinate descent methods for big data optimization,
Mathematical Programming,
March 2016, Volume 156, Issue 1, pp 433-484
DOI 10.1007/s10107-015-0901-6


Conference Proceedings Published

[C1] P. Richtarik.
Finding sparse approximations to extreme eigenvectors:
generalized power method for sparse PCA and extensions,
Proceedings of SPARS11 (Signal Processing with Adaptive Sparse
Structured Representations), 2011

[C2] P. Richtarik and M. Takac.
Efficiency of randomized coordinate descent methods on minimization
problems with a composite objective function,
Proceedings of SPARS11 (Signal Processing with Adaptive Sparse
Structured Representations), 2011

[C3] P. Richtarik and M. Takac.
Efficient serial and parallel coordinate descent methods
for huge-scale truss topology design,
Operations Research Proceedings, pp 27-32, 2012

[C4] Olivier Fercoq, Zheng Qu, Peter Richtárik and Martin Takác.
Fast distributed coordinate descent for minimizing non-strongly convex losses
To Appear in IEEE Workshop on Machine Learning for Signal Processing, 2014

Papers Submitted

[S1] B. Buke and H. Chen.
Stabilizing Policies for Probabilistic Matching Systems,
Queueing Systems Vol 80 (2015) 35--69.
DOI 10.1007/s11134-015-9433-2

see also [9] above:
[S2] I. Dassios, K. Fountoulakis and J. Gondzio
A Preconditioner for a Primal-Dual Newton Conjugate Gradients
Method for Compressed Sensing Problems
SIAM Journal on Scientific Computing 37 (2015) A2783--A2812.

[S3] O. Fercoq and P. Richtarik.
Smooth minimization of nonsmooth functions with parallel coordinate descent methods,
Submitted.
arXiv:1309. 5885, 9/2013

see also [13] above:
[S4] O. Fercoq and P. Richtarik.
Accelerated, parallel and proximal coordinate descent,
SIAM J. Optim., 25(4), 1997-2023. (27 pages)
DOI:10.1137/130949993

see also [10] above:
[S5] K. Fountoulakis and J. Gondzio.
A Second-Order Method for Strongly Convex L1-Regularization Problems,
Mathematical Programming 156 (2016), pp. 189--219.
DOI: 10.1007/s10107-015-0875-4

[S6] W. Hulme, P. Richtarik, Lynne McGuire and A. Green.
Optimal diagnostic tests for sporadic Creutzfeldt-Jakob disease
based on SVM classi?cation of RT-QuIC data, Manuscript, arXiv:1212.2617, 12/2012

[S7] J. Konecny and P. Richtarik.
Semi-stochastic gradient descent,
Submitted.
arXiv:1312.1666, 12/2013

see also [14] above:
[S8] P. Richtarik and M. Takac.
Parallel coordinate descent methods for big data optimization,
Mathematical Programming,
March 2016, Volume 156, Issue 1, pp 433-484
DOI 10.1007/s10107-015-0901-6

[S9] Peter Richtárik, Takác, Selin Damla Ahipasaoglu.
Alternating Maximization: Unifying Framework for 8 Sparse PCA
Formulations and Efficient Parallel Codes, arXiv:1212:4137,
Submitted.

[S10] P. Richtarik and M. Takac.
Distributed coordinate descent method for learning with big data,
Submitted, arXiv:1310.2059, 10/2013

[S11] P. Richtarik and M. Takac.
On optimal probabilities in stochastic coordinate descent methods,
Submitted.
arXiv:1310.3438, 10/2013

[S12] Martin Takác, Selin Damla Ahipasaoglu, Ngai-Man Cheung, Peter Richtárik.
TOP-SPIN: TOPic discovery via Sparse Principal component Interference,
Submitted.
arXiv:1311:1406

[S13] M. Takac, J. Marecek and P. Richtarik.
Inequality constrained matrix completion, Submitted, 1/2014

[S14] R. Tappenden, P. Richtarik and J. Gondzio.
Inexact coordinate descent: complexity and preconditioning,
Accepted for publication in
Journal of Optimization and its Applications.

[S15] B.Buke and H. Chen. Fluid and diffusion limits for probabilistic matching systems,
Submitted, 10/2014

[S16] Jakub Marecek, Peter Richtárik and Martin Takác
Distributed block coordinate descent for minimizing partially separable functions,
Submitted. May 2014

Awards, Distinctions & Subsequent Grants

CDT in Data Science (2014-2021), Dr Richtarik and Prof. Gondzio
are involved in the CDT funded by a £5.5 grant from EPSRC,
partially due to their success with the current grant

Google Doctoral Fellowship, 2014-2017, for Dr Richtarik's PhD
student J. Konecny, 15 Fellowships were awarded in 2014 in Europe

Invited Visiting Assistant Professorship for Dr Richtarik at UC Berkeley,
Fall 2013, for a Semester long programme
on Theoretical Foundations of Big Data Analysis

Nomination for the 2014 Microsoft Research Faculty Fellowship,
Dr Richtarik, every university can nominate a single candidate
(not successful, no awards were given to candidates in Europe this year)

16th Leslie Fox Prize (2nd Prize), Institute for Mathematics and
its Applications, 2013, for paper [5]

EPSRC grant for Dr Richtarik (joint with J. Tanner, Oxford)
EP/J020567/1

EPSRC 1st Grant, for Dr Richtarik
EP/K02325X/1

M. Takac, Dr Richtarik's PhD student. Best Student Paper Award
(sole runner-up), INFORMS Computing Society, 2012, for paper [5]

Industrial Impact

[II1] Interest from Baidu (China's Google), ongoing collaboration
with the Head of their Big Data Dept.

[II2] Amazon.com has implemented the method and is interested
in joint grant proposals and research. M. Takac (PhD student
of Dr. Richtarik) was offered a position there.

[II3] Developed the current state of the art test for Creutzfeld-Jakob
Disease; jointly with Western General Hospital, Edinburgh

[II4] Dr Richtarik was invited to give talks at:
Google (Mountain View, California),
Microsoft Research (Bangalore, India),
SAS (Cary),
Amazon (Berlin)

Workshops Organized

08/2013 Applied Probability in Digital Economy
(Edinburgh, 2 days, 40 participants)

05/2013 Optimization and Big Data
(Edinburgh, 3 days, 60 participants), 50% co-funding from NAIS

05/2012 Advances in Large Scale Optimization
(Edinburgh, 2 days, 60 participants), 50% co-funding from NAIS

10/2013 Computational Linear Algebra and Optimization for the Digital Economy
(Edinburgh, 2 days, 40 participants)

06/2014 Convex Optimization and Beyond
(Edinburgh, 1 day, 90 participants)

Further Impact

K Fountoulakis (PhD student of Prof Gondzio) did an internship:
Noise Detection and Removal from Physiological Sensor Signals,
and worked with the MIME project (www.dotrural.ac.uk/mime)
of Dot.Rural in Aberdeen.

PhD students and postdocs

Attracted top talent as PhD students and postdocs for the project:
Postdoc I Dassios (will join Manchester University as a postdoc)
Postdoc O Fercoq (Gaspard Monge Prize for Best Thesis in France in 2012 in Optimization)
Postdoc J Marecek: first position IBM Research Dublin
Postdoc R Tappenden (will join John Hopkins University as a postdoc)
Postdoc K Woodsend (joined the School of Informatics, Edinburgh as a postdoc)
PhD Student H Chen
PhD Student K Fountoulakis
PhD student J Konecny
PhD student M Takac (9 prizes throughout his PhD; 2 major; first position tenure track in the USA)

Conferences


Conference and Workshop Presentations by B. Buke

Probabilistic Matching Systems, Applied Probability Models in Digital
Economy Workshop, Edinburgh, UK, August 2013

Stability of Probabilistic Matching Systems, INFORMS Applied Probability
Society Conference, San Jose, Costa Rica, July 2013

Stabilizing Policies for Probabilistic Matching Systems,
XXVI EURO - INFORMS 26th European Conference on Operational Research, Rome, Italy, July 2013

Mathematics for Internet Matching Portals, Uncertainty Quantification
& Optimization Workshop, Strathclyde University, Glasgow, UK, March, 2013

Decomposition in stochastic optimization by parallel coordinate descent,
INFORMS Computing Society Conference, Santa Fe, NM, January 2013

Stabilizing policies for matching portals, INFORMS Annual Meeting,
Phoenix, AZ, October 2012


Conference and Workshop Presentations by H. Chen

Diffusion Approximations for Probabilistic Matching Systems,
Informs Annual Meeting, San Francisco, US, Conference Talk (planned)

Probabilistic Matching systems: Stability and Diffusion Approximations,
Stochastic Network Meeting, Amsterdam, The Netherlands,
Poster presentation and pitch talk.

Stabilizing Policies for Probabilistic Matching Systems,
Modern Probabilistic Techniques for Design, Stability, Large Deviations,
and Performance Analysis of Communication, Social, Energy, and Other
Stochastic Systems and Networks, Cambridge, UK, Conference Talk.

Stabilizing Policies for Probabilistic Matching Systems,
Student Conference in Operational Research, Nottingham, UK, Conference Talk.

Stabilizing Policies for Probabilistic Matching Systems,
Young Researcher in Mathematics, Bristol, UK, Conference Talk.


Conference and Workshop Presentations by J. Gondzio

Numerous invited keynote and plenary talks at national
and international workshops and conferences


Conference and Workshop Presentations by P. Richtarik

Numerous invited keynote and plenary talks at national
and international workshops and conferences

Edinburgh Science Festival and Other Outreach Activities

M Takac presented at National Museum of Scotland

H Chen, the Museum Late, A night in wonderland with Lewis Carroll theme
in the Scottish National Museum


Publicity Material:

Article: J. Gondzio,
Managing the unmanageable,
International Innovation 2013.


MSc Dissertations

Promised 3, but supervised many more MSc dissertations directly
related to the project

Supervised by Dr Buke:
Chtistina Moustaka (2014), area: Multi-class matching systems for employment portals
Andreas Schwab (2012), area: Application to call centres
Yang Yang (2011), area: Application to call centres

Supervised by Prof Gondzio:
Hang Dai (2011) Interior point methods and intensity modulated radiation therapy
David Piper (2013) Scheduling data transfers in geo-distributed data centers
Yizhuo Bai (2013) Stock index forecating and face recognition using support vector machines
Ying Zhou (2013) Algorithms for finding the nearest correlation matrix

Supervised by Dr Richtarik:
Alexander Banks-Watson (2012), area: Decomposition for Vast Data Analysis
Christos Delivorias (2012), area: Big Data Optimization in Finance
William Hulme (2011), area: Applications to Health

Furthemore, we supervised 8 externally funded (EPSRC, Nuffield, College)
undergraduate research projects directly reated to the project.

Supervised by Dr Richtarik:
Debadri Mukherjee (2014, Informatics)
Mojmír Mútny (2014, Physics)
Iliana Peneva (2012, Mathematics)
Bartosz Filipecki (2011, Mathematics)
Edward Cumberlege (2011, Mathematics)
Bartosz Filipecki (2010, Mathematics)

Supervised by Dr Buke:
Xizi Li (2014, Mathematics)
Exploitation Route The key research results of the project are documented
in several papers. Some of them are accompanied with
the software which is available from the ERGO software website:
http://www.maths.ed.ac.uk/ERGO/software.html
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics

 
Description The key research results of the project are documented in several papers. Some of them are accompanied with the software which is available from the ERGO software website: http://www.maths.ed.ac.uk/ERGO/software.html Edinburgh Science Festival and Other Outreach Activities M Takac presented at National Museum of Scotland H Chen, the Museum Late, A night in wonderland with Lewis Carroll theme in the Scottish National Museum Industrial Impact [II1] Interest from Baidu (China's Google), ongoing collaboration with the Head of their Big Data Dept. [II2] Amazon.com has implemented the method and is interested in joint grant proposals and research. M. Takac (PhD student of Dr. Richtarik) was offered a position there. [II3] Developed the current state of the art test for Creutzfeld-Jakob Disease; jointly with Western General Hospital, Edinburgh [II4] Dr Richtarik was invited to give talks at: Google (Mountain View, California), Microsoft Research (Bangalore, India), SAS (Cary), Amazon (Berlin)
First Year Of Impact 2013
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics
Impact Types Societal,Economic