Practical Submodular Optimisation Beyond the Standard Greedy Algorithm

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Mathematical Sciences

Abstract

Algorithms for submodular optimisation have been successfully applied to a variety of difficult problems at the heart of data science, machine learning, and operational research. Behind these successes is a rich mathematical theory that precisely specifies the conditions under which simple heuristics will always produce nearly optimal solutions to problems. This proposal will enlarge this theory to allow existing approaches to be brought to bear on new classes of problems, and, where necessary, develop new algorithms for problems that lie beyond their reach.

Intuitively, submodular optimisation concerns the problem of finding the best set of items from some large collection in the presence of "diminishing returns." For example, suppose a company wants to select where to place 100 advertisements with the aim of maximising future sales. The marginal benefit of any given ad placement in the campaign will probably be larger if only 10 ads have already been placed than if 99 have been placed! Due to this diminishing returns property, one can prove that this problem is difficult to solve optimally. However, this same property can also be used to show that a simple "greedy" algorithm is mathematically guaranteed to produce solutions that are nearly optimal.

Unfortunately, even seemingly minor variations in the formulation of an optimisation problem can have drastic affects on these guarantees. For example, our company might decide that instead of having a fixed budget for its marketing campaign, it could simply account for the cost of each advertisement explicitly and then try to maximise the total expected revenue minus the advertisement cost. This relatively small change in formulation is enough to make existing guarantees break down completely, since now the value (i.e. profit) of a set of items (i.e. advertisements) may become negative.

The goal of this project is to obtain new, practical optimisation algorithms with rigorous, mathematical quality guarantees for new classes of problems such as the example given above that cannot be handled efficiently using existing notions of submodularity. Specifically, we aim to develop techniques for dealing with some classes of problems in which our notion of value is possibly negative, decreasing, or non-submodular, and also to develop scalable algorithms for treating problems that cannot be handled with greedy approaches. For problems where existing optimisation algorithms work well in practice but do not have guarantees, we aim to identify the underlying structures and problem characteristics that explain why. For problems where this is not the case, we seek to develop new algorithmic techniques that address legitimate shortcomings of standard approaches. Our goal is thus to expand the interface between theory and practice. From a theoretical perspective, we aim to develop a more precise understanding of those properties make problems easy or hard to solve approximately. From a practical perspective, we aim to develop insight and tools that can be use to model problems, design and select algorithms, and interpret the results of these algorithms with confidence.

Planned Impact

More than ever before, our daily lives are being shaped by algorithms. These algorithms, fed by an unprecedented amount of data, promise to enable more efficient infrastructure, communications, and decision-making. To deliver on this promise, we need a way to identify which algorithmic approaches are likely to be most successful for new applications. At the same time, we need to be able to trust that the optimisation algorithms we employ are producing answers that are indeed close to the best possible.

This project aims to develop improved algorithms that are applicable to wide classes of optimisation problems, together with a mathematical theory providing strong guarantees. The guarantees we seek will ensure that, even in the worst possible cases, these algorithms produce solutions that are reasonably close to optimal. Algorithms for submodular optimisation are already being successfully used for several applications in machine learning, data science, and operational research, either alone or as a key component of other complex systems. However, for many natural variants of these optimisation problems, our mathematical theory currently lags behind practical experience. Here, we propose closing this gap, either by improving guarantees for existing algorithms, or identifying situations in which known algorithms do break down and developing techniques to cope with this.

The optimisation problems that form the basis of the current proposal are key components for several large-scale machine learning technologies, which have the potential to enable new data-driven sectors of the economy, and to develop intelligent systems for infrastructure, healthcare, and finance. By improving and enlarging the scope of these tools, the proposed research will directly impact and accelerate national development in these areas, but also have potential to enhance our understanding of existing and emerging technologies.

Funded Value:

£121,643

Funded Period:

Sep 19 - Aug 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/T006781/1

Principal Investigator:

Justin Ward

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (100%)

Organisations

People	ORCID iD
Justin Ward (Principal Investigator)

Publications

Author Name Title Publication Date Published

10 25 50

Huang C.-C. (2020) Improved Multi-Pass Streaming Algorithms for Submodular Maximization with Matroid Constraints

Huang Chien-Chung (2020) FPT-Algorithms for the \ell-Matchoid Problem with a Coverage Objective in arXiv e-prints

T. Thiery (2022) Two-Sided Weak Submodularity for Matroid Constrained Optimization and Regression

Thiery T (2023) Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)

Key Findings
Impact Summary
Collaboration
Engagement Activities


Description	Many optimisation problems related to statistical modelling, data science, and sensor placement problems are difficult to solve due to non-linearity. Informally, the problem is that for many real-world problems, the total "value" or "cost" of a proposed solution is not simply derived from a sum of individual decisions made in isolation. Rather, there may be complex interactions between the individual decisions making up the problem. One such case is when the problem exhibits "diminishing returns." Here, making one choice (for example, deciding to place a sensor at some location) may make future choices contribute less additional value to the overall solution. In this context, many approaches are known for making near-optimal decisions, typically based on a simple, greedy strategy that says "do the right thing now, and worry about the future later." However, less is known about problems that do not exhibit a diminishing returns property. This is the case, for example, in many sensor-placement problems where placing one sensor may make data from other sensors to be more valuable in the future because it allows for their measurements to be used for a more effective triangulation. Similarly, when designing clinical trials, the outcome from one experiment may make data from other experiments more informative. As part of the ongoing work funded by this award, we have discovered useful mathematical properties that allow us to obtain efficient, near-optimal algorithms for a wide class of similar problems. Although the problems are (provably) hard to solve exactly, our algorithms produce a solution that is mathematically guaranteed to have value "close" to the optimal. Our framework has allowed us to derive new algorithms for experimental design, in which we must select the best observations or experiments to make under some constraints on what is feasible. Our algorithms work by going beyond the standard "greedy" technique described above. In many new settings, however, the amount of data being processed is so large that it cannot be stored in a single machine's memory. Here, even the standard greedy technique becomes difficult to implement efficiently. In further work related to the award, we have developed several improved near-optimal algorithms for problems with natural diminishing returns properties that work by processing the dataset sequentially over several passes. This can be much more efficient when data must be accessed, for example, on a hard disk or solid-state memory. We have also developed new theoretical understanding of what is possible in this setting if one insists on obtaining the true optimal solution. Our new results clarify which aspects of the problem are responsible for the hardness in this setting.
Exploitation Route	Some of the results of the award have applications in design of experiments, statistical analysis, as well as practical problems such as sensor placement. Due to the strong, mathematical guarantees obtained, there has been some interest in sensor placement problems related to critical sectors such as defence and security. We are now exploring these via ongoing partnerships with PA Consulting and DSTL. Further applications might include better design of clinical trials, or better analysis of data from these trials, as well as several problems from operational research. Similar techniques have been employed, for example, in the analysis of power grids and in the interpretation of complex models produced by neural networks.
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy


Description	As discussed in more detail in Collaborations and External Partnerships and Engagement Activities, I have contributed expertise in the the general area of the findings and adjacent fields in partnerships with PA Consulting and DSTL. This work was directly motivated by problem identified by stakeholders in the security and defence sectors. The main goal of these partnerships was the development of new techniques for efficiently placing electromagnetic sensors in urban environments so that malicious signals can be quickly identified and disabled. This has resulted in a computational model for evaluating the quality of urban sensor placements, optimising the selection and placement of sensors to achieve robust detection capability, as well as first steps towards incorporating uncertainty in sensor data and the operational costs of placements. The resulting software is currently being evaluated by DSTL in the context of real-world sensor placement problems. Direct impacts related to the findings of the grant have been somewhat slow to materialise due to delays in dissemination due to the pandemic. I expect that these will begin to materialise shortly and am also exploring ways in which they may be able to further enhance the general impact work described above.
First Year Of Impact	2022
Sector	Aerospace, Defence and Marine,Security and Diplomacy
Impact Types	Societal,Policy & public services


Description	Collaboration with PA Consulting and DSTL on sensor placement
Organisation	Defence Science & Technology Laboratory (DSTL)
Country	United Kingdom
Sector	Public
PI Contribution	This collaboration was launched as a result of my attendance at the workshop "Challenges in the Electromagnetic Environment" (described further in the Engagement section). After the workshop, PA Consulting decided to fund further research into two complementary approaches for optimisation of sensor placement problems, led by myself and Dr. Will Handley (Cambridge). As part of the project, I have applied discrete optimisation techniques to determine optimal or near-optimal placements of sensors to detect malicious signals in an urban environment. The project is related to challenges from security and defence that were originally introduced by stakeholders in the public sector (primarily DSTL). The first round of this project was completed in March 2021. It was presented as a "success story" at a follow-up INI workshop related to the topic. Based on these successes, PA consulting and DSTL decided to continue with a second round to refine the model and techniques further, through March of 2022. The follow-on research resulted in a more accurate model with added capabilities. These included better development of refined objectives to explicitly quantify and optimise over spatial uncertainty in the localisations provided by sensors.
Collaborator Contribution	PA Consulting (acting as a contractor of DSTL) is the stakeholder for the main problem under consideration. They have contributed expertise and guidance on sensor technologies available, as well as what the main goals and challenges are for the problem. Practitioners from DSTL have also been involved in our consultation meetings, where they have contributed expertise on the underlying problem and desired outcomes.
Impact	Both rounds of the project produced technical reports for use by DSTL, as well as a software model for optimising of sensor placement in small cities.
Start Year	2020


Description	Collaboration with PA Consulting and DSTL on sensor placement
Organisation	PA Consulting
Country	United Kingdom
Sector	Private
PI Contribution	This collaboration was launched as a result of my attendance at the workshop "Challenges in the Electromagnetic Environment" (described further in the Engagement section). After the workshop, PA Consulting decided to fund further research into two complementary approaches for optimisation of sensor placement problems, led by myself and Dr. Will Handley (Cambridge). As part of the project, I have applied discrete optimisation techniques to determine optimal or near-optimal placements of sensors to detect malicious signals in an urban environment. The project is related to challenges from security and defence that were originally introduced by stakeholders in the public sector (primarily DSTL). The first round of this project was completed in March 2021. It was presented as a "success story" at a follow-up INI workshop related to the topic. Based on these successes, PA consulting and DSTL decided to continue with a second round to refine the model and techniques further, through March of 2022. The follow-on research resulted in a more accurate model with added capabilities. These included better development of refined objectives to explicitly quantify and optimise over spatial uncertainty in the localisations provided by sensors.
Collaborator Contribution	PA Consulting (acting as a contractor of DSTL) is the stakeholder for the main problem under consideration. They have contributed expertise and guidance on sensor technologies available, as well as what the main goals and challenges are for the problem. Practitioners from DSTL have also been involved in our consultation meetings, where they have contributed expertise on the underlying problem and desired outcomes.
Impact	Both rounds of the project produced technical reports for use by DSTL, as well as a software model for optimising of sensor placement in small cities.
Start Year	2020


Description	Challenges in the Electromagnetic Environment Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	This is an ongoing series of multi-day workshops hosted by the Isaac Newton Institute in collaboration with the Defence Science and Technology Laboratory and PA Consulting. The aim is to bring together mathematicians from multiple disciplines with stakeholders from the defence and security sectors, in order to find new approaches for modelling problems related to the general "electromagnetic environment." The challenge problems were contributed by government and industrial practitioners and included problems from signal processing, sensor placement, anomaly detection, and analysis of transmission protocols. The long-term goals were to "build closer links and collaborations between applied mathematicians and the owners of complex challenges, establishing a joined up multi-disciplinary UK community for this area." As part of the second workshop, held (virtually) in the Summer of 2020, I participated in a challenge problem related to finding optimal placements for sensors in an urban environment in order to detect a malicious transmission as rapidly and accurately as possible. A combined approach developed by Will Handley (Cambridge) and myself was selected for further funding via the industrial partner PA Consulting, which has lead to an ongoing collaboration, as described in the "Collaborations" section of this return. Our approach and findings and work was presented at a follow-up workshop in Summer of 2021 as a "success story" and PA consulting and DSTL have decided to fund follow-on work refining the project. Since then, I have attended several follow-on workshops, most recently in January 2023, under the new title "Maths of EMA."
Year(s) Of Engagement Activity	2020,2021,2022,2023
URL	https://gateway.newton.ac.uk/event/tgmw74


Description	Invited Talk at Yahoo Research, New York
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	In June of 2020, I gave an invited virtual seminar at Yahoo Research, New York. The topic of the talk was related to the foundational work for the current project, with the goal being to build further collaborations between myself and industrial researchers and practitioners. Similar problems and potential areas for further collaboration were identified for future exploration, but these plans have not yet been made concrete.
Year(s) Of Engagement Activity	2020

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications