Coarse geometry and cohomology of large data sets

Lead Research Organisation: University of Southampton
Department Name: School of Mathematics

Abstract

The digital economy is founded on data. The current trend to develop intelligent, customer-led, interactive, real-time systems requires the ability to handle and interpret vast amounts of data efficiently, quickly, and with a degree of accuracy corresponding to the requirements. This point is underlined very well by two important recent developments. The Smarter Planet initiative, supported by the IBM, envisages 'instrumented, interconnected data systems' where main elements of the physical environment are equipped with sensors constantly exchanging information. Secondly, the new transparency drive of the UK government will make huge data sets available to the public, creating 'an opportunity to build innovative applications which will bring significant economic benefit'. The need for synthetic geometric methods in data analysis arises because of the large size and of high dimensionality of the sets involved. This proposal will extend recent important theoretic results to create a set of geometric and topological tools for data analysis, placing special emphasis on flexibility, efficiency, and on close alignment with potential practical applications. This is an ideal and a very exciting time to launch a project of this nature, and its results are very likely to have direct and important consequences from the point of view of initiatives mentioned above and many other possible applications. A central theme of the proposal is the study of geometric properties of large data sets at various scales, which corresponds to varying degree of 'sharpness' with which a data set is viewed. For example, in searching large numbers of digital photographs for those that contain pictures of of people one requires a different resolution than when trying to identify a specific person. This proposal offers a very exciting opportunity for developing pure mathematical methods to the point where they can be directly applied to important, difficult and timely practical problems. The proposed work is adventurous, interdisciplinary, and brings together pure and applied mathematicians, experts in OR, computer science, statistics, and energy systems. Potential for long-term practical applications will be tested in two specific areas of applications within the context of the wider Smarter Plane initiative. A main objective of the project is to develop geometric and cohomological tools of scale-dependent coarse geometry with special emphasis on applications to finite metric spaces and more specifically, to data sets. We will place strong emphasis on methods that can be developed into efficient tools for data analysis, and the research will be informed by specific problems arising from applications which range from the theoretical to the more practical. We will test the theoretical ideas and results two important cases: one, data sets arising from the UK Government's Open data initiative. Secondly, within the context of smart grids, we will consider data generated by large number of sensors monitoring various aspects of the performance of a power grid with the objective to provide an accurate matching between supply and demand.

Planned Impact

This interdisciplinary proposal will develop new mathematical tools to study large data sets and will investigate algorithmic implementation of the the most promising results, which will be tested on specific practical problems of direct economic importance. This suggests a wide spectrum of possible ways this work can have impact. The theoretical problems studied in the proposal arise from intensive recent work in pure mathematics, where important developments have opened up possibilities for developing direct applications of these new results and methods. Our main applications will be to the analysis of data sets, and by making this the main focus of our work, we are aiming at the very centre of digital economy. No matter what kind of digital economic activity one considers, there is a database behind it. Our methods will be flexible, scaleable, portable, and applicable in a great variety of economic contexts. We will test their strength on specific problems arising in real world systems, to ensure the best possible fit between theoretical results and practical requirements. Specifically, we will directly contribute to the analysis of data emerging through the UK Government's Transparency Inititative, where on the one hand our work can create new possibilities for exploiting the data, and on the other it may have an influence on Government Policy of data acquisition and processing. Through Professor Nigel Shadbolt (CI) who is a member of Public Sector Transparency Board working with the UK Government on its Open Data policy we will have an opportunity to influence policy in this area. Another specific project will concern citation data emerging from the use of the arXiv preprint data base, which is of crucial importance to the scientific community. We will also study specific data problems arising from designing intelligent instrumentation to support the Smart Grids idea for efficient management of energy supply and demand. All significant results achieved during this investigation will be published in appropriate peer-reviewed scientific journals and presented at conferences as well as seminars. We will ensure that potential beneficiaries have the opportunity to engage with this research by disseminating findings via CORMSIS, the Centre for Operational Research, Management Science, and Information Systems at the University of Southampton, and via SIMM, the Southampton Initiative in Mathematical Modelling. CORMSIS and SIMM have direct contacts with more than 90 different organisations in business, industry, and government, among them IBM, BA, bmi, Ford, Boeing, Qualcomm, AA, BT, Logical Transport, Philips, Dstl, Southern Water, Hampshire Container Terminals, American Express, Barclays, JP Morgan, Tesco, NATS, Unilever, Rolls Royce, several NHS thrusts, various City Councils, HM Revenue and Customs, the Met Office, MoD, ONS, etc. In a similar way, we will exploit connections offered by the Durham Energy Institute. A very important pathway to impact is through the Web Science initiative at Southampton, which is led by Professor Dame Wendy Hall, Professor Sir Tim Berners-Lee and Professor Nigel Shadbolt (a CI on the project), and through the University Strategic Research group in Digital Economy. This connection will join the project with a wide variety of potential users in academia, business, politics, and we will explore these possibilities vigorously.

Publications

10 25 50
 
Description One of the main results of this work has been the development of a new data analytic pipeline, which combines topological data analysis, statistical machine learning, and graph theory. We have created and demonstrated this new methodology in the study of a number of complex data sets. We have also demonstrated how to use graph-theoretic techniques to create an abstract shape describing the main interactions within the group. This was further supported by information arising from natural language processing, which provided insights to the nature of the contribution of the individual participants. In the course of this work we have benefited from a collaboration with Ayasdi, a company set up by Carlsson, Singh and coworkers. This new methodology is ready to be exploited in other contexts, which we intend to do.

In a different direction, we have developed a way of combining topological data analysis with spectral properties of Laplace operators. In a recent development we have created a new "twisted" Laplace operator that provides a significant generalisation of the synchronisation methods developed by Singer and collaborators. This very interesting new idea allows us to combine spectral properties of generalised Laplace operators with analogues of geometric constructions like parallel transport, connections, etc.

We have also undertaken a thorough study of various coarse geometric constructions, like embeddability, amenable actions, property A, etc, to elucidate the relationship between the continuous and discrete approximations of spaces, and to understand the role of scale in the coarse geometric context. In this context, we have obtained a number of exciting new results on the boundary between geometry and analysis. In particular, we have constructed a new differential complex arising from groups acting on CAT(0)-cubical spaces, which is likely to be of importance in this area of mathematics and its direct applications.
Exploitation Route We have been very active in investigating potential commercial applications of the work. We have applied for and received support from the SetSquared consortium to conduct a six-month market research study, and we have presented out findings to a panel of entrepreneurs at the Department of Business, Innovation, and Skills.

We have been approached by a private investment company to investigate possible joint ventures, and this is currently ongoing.

We had a number of meetings with the National Grid to investigate possible ways of collaboration.

We have set up a successful collaboration with a research group at Duke University led by Mukherjee.
Sectors Chemicals

Creative Economy

Digital/Communication/Information Technologies (including Software)

Education

Energy

Healthcare

Pharmaceuticals and Medical Biotechnology

Security and Diplomacy

Transport

 
Description Research results from this paper contributed to a KTP proposal submitted to Innovate UK in February 2018. The KTP submission was successful, and brought an award with a total value of £162,000. The programme ran from May 2018 until Jan 2021. The award brought significant benefits to the partner company. Our final submission was judged as "very good".
First Year Of Impact 2021
Sector Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Financial Services, and Management Consultancy,Other
Impact Types Policy & public services

 
Description Standard research: Making sense from data
Amount £1,218,040 (GBP)
Funding ID EP/N014189/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 12/2015 
End 05/2019
 
Description Applied Algebraic Topology (LMS) 
Organisation Queen Mary University of London
Department School of Mathematical Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a collaborative research network established to support work in applied algebraic topology.
Collaborator Contribution We have jointly organised a number of research meetings.
Impact We have jointly organised seven research meetings to date.
Start Year 2014
 
Description Cafe Scientifique talk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact I have been invited to give a popular talk based on my current research, the title was "Measuring the world: From Pythagoras to Big Data"
Year(s) Of Engagement Activity 2016
URL http://www.diverse.ip3.co.uk/scicaf.htm