Joining the dots: from data to insight

Lead Research Organisation: University of Southampton
Department Name: Sch of Mathematical Sciences

Abstract

The relentless growth of the amount, variety, availability, and the rate of change of data has profoundly transformed essentially all aspects of human life. The Big Data revolution has created a paradox: While we create and collect more data than ever before, it is not always easy to unlock the information it contains. To turn the easy availability of data into a major scientific and economic advantage, it is imperative that we create analytic tools that would be equal to the challenge presented by the complexity of modern data.
In recent years, breakthroughs in topological data analysis and machine learning have paved the way for significant progress towards creating efficient and reliable tools to extract information from data.

Our proposal has been designed to address the scope of the call as follows.
To 'convert the vast amounts of data produced into understandable, actionable information' we will create a powerful fusion of machine learning, statistics, and topological data analysis. This combination of statistical insight, with computational power of machine learning with the flexibility, scalability, and visualisation tools of topology will allow a significant reduction of complexity of the data under study. The results will be output in a form that is best suited to the intended application or a scientific problem at hand. This way, we will create a seamless pathway from data analysis to implementation, which will allow us to control every step of this process. In particular, the intended end user will be able to query the results of the analysis to extract the information relevant to them. In summary, our work will provide tools to extract information from complex data sets to support user investigations or decisions.

It is now well established that a main challenge of Big Data is how 'to efficiently and intelligently extract knowledge from heterogeneous, distributed data while retaining the context necessary for its interpretation'. This will be addressed first of all by developing techniques for dealing with heterogenous data. A main strength of topology is its ability to identify simple components in complex systems. It can also provide guiding principles on how to combine elements to create a model of a complex system. It also provides numerical techniques to control the overall shape of the resulting model to ensure that it fits with the original constraints. We will use the particular strengths of machine learning, statistics and topology to identify the main properties of data, which will then be combined to provide an overall analysis of the data. For example, a collection of text documents can be analysed using machine learning techniques to create a graph which captures similarities between documents in a topological way. This is an efficient way to classify a corpus of documents according to a desired set of keywords. An important part of our investigation will be to develop robust techniques of data fusion. This is important in many applications. One of our main applications will address the problem of creating a set of descriptors to diagnose and treat asthma. There are five main pathways for clinical diagnosis of asthma, each supported by data. To create a coherent picture of the disease we need to understand how to combine the information contained in these separate data sets to create the so called 'asthma handprint' which is a major challenge in this part of medicine.

Every novel methodology of data analysis has to prove that its 'techniques are realistic, compatible and scalable with real- world services and hardware systems'. The best way to do that is to engage from the outset with challenging applications , and to ensure that theoretic and modelling solutions fit well the intended applications. We offer a unique synergy between theory and modelling as well as world-class facilities in medicine and chemistry which will provide a strict test for our ideas and results.

Planned Impact

It is difficult to think of an area of human activity that has not been profoundly changed by the relentless flow of data. Large, complex, heterogenous data sets are now ubiquitous, and the lack of robust, powerful tools capable of dealing with data is now a serious obstacle to progress.

This proposal is very firmly focused on long-term impact of the proposed research. It is clear to us that the only ideas that will stand the test of time will be those that have been robustly tested on challenging problems emerging from key areas of application. Our proposal has been designed to ensure that our work creates significant impact within academia and far beyond.

It is our ambition to create a seamless pathway from theory to applications that will ensure a lasting and substantial impact of the proposed work. Within academia, we will communicate our results through research papers, conference talks, invited talks, web sites and blogs. To ensure that our methods are realistic, scalable, and useful, we will concentrate on specific real-world problems. We will use our extensive network of scientific and business connections to reach out to potential end users and to identify opportunities for implementing our findings. Our results will be of direct importance in medicine, and are likely to lead to implementations of new procedures or algorithms.

The importance of big data in everyday life in the UK and globally is well documented, and is an important reason behind this call. In creating this proposal we were guided by the long-term needs of the sciences involved, as well as the broader society. This work will lead to significant results, that will be demonstrated on really important areas of application where our contribution will have impact well beyond the academic community. In our selection we have been guided by the long term development of the sciences involved. This proposal is ambitious and internationally competitive and can establish UK science as a leader in this area. We bring together a number of the key disciplines in EPSRC's portfolio: mathematics, statistics and applied probability, computer science, chemistry and will make significant contributions to each. Furthermore, the proposal addresses societal challenges by addressing key problems in medicine and is likely to have impact to personalised health care. This work has the potential to contribute to the UK economy through possible implementations of the best algorithmic results. Finally, this proposal fits very well with research supported by the EPSRC and other RCUK councils.

We have set aside funds within our budget to be used specifically on activities likely to strengthen the impact of this proposal. They will be used to organise concentrated workshops that will bring together top data scientists, to arrange small scale meetings with members of the UBIOPRED consortium and medical practitioners. We will be proactive in investigating new routs of implementation of the most promising results. In this we will be supported by the vast network of thriving collaborations with business and industry that exist at the University of Southampton.

Publications

10 25 50

publication icon
Belchí F (2019) Optimising the Topological Information of the $$A_\infty $$ A 8 -Persistence Groups in Discrete & Computational Geometry

publication icon
Brodzki J (2019) A differential complex for CAT(0) cubical spaces in Advances in Mathematics

publication icon
Brodzki J (2017) Exactness of locally compact groups in Advances in Mathematics

publication icon
Brodzki J (2016) The local spectrum of the Dirac operator for the universal cover of SL 2 ( R ) in Journal of Functional Analysis

publication icon
Brodzki J (2019) A differential complex for CAT(0) cubical spaces in Advances in Mathematics

publication icon
Brodzki Jacek (2016) Exactness of locally compact groups in arXiv e-prints

publication icon
Burg D (2018) Large-Scale Label-Free Quantitative Mapping of the Sputum Proteome. in Journal of proteome research

 
Description 1. We have implemented a novel technique based on persistent homology to analyse CT scans of lungs of patients with COPD. The results are very promising as they demonstrate a correlation between topological characteristics derived from the CT scans and clinical information about the patients.

2. We have created a pipeline for computing persistence for a large class of molecules and demonstrated their relevance for detecting solubility.

3. We have completed a study of cyclic homology of crossed product finite type algebras and proved a detailed formula that expresses that homology in terms of the orbifold cohomology of the underlying orbifold.
4. We have completed our study of topological characteristics of the shape of the lung, and the results are quite striking. We have developed new radiomic features that provide better stratification of COPD than standard methods.
5. We have completed a geometric study of synchronisation problems and constructed an algorithm which allows a classification of three-dimensional objects.
6. We have completed a subtle and difficult construction of a differential complex associated with group actions on CAT(0)-cubical spaces and have started new phase of the project in constructing an explicit proof of the Baum-Connes conjecture in this case.
7. We have established a new set of computable characteristics to detect solubility of chemical molecules.
8. We have created a new topological way to describe the structure of the space of the conformers of chemical molecules.
9. We have contributed to a wide raging study of creating new asthma phenotypes, and biomarkers.
10. We have provided a first example of a very new methodology for proving the Baum-Connes conjecture for groups acting on CAT(0) cube complexes.
11. We have created a novel approach to tensor factorisation.
12. We created a new numerical measure of instability of Mapper-type algorithms.
13. We developed a thorough topological characterisation of the conformation space of certain classes of molecules.
Exploitation Route We are developing a clinical and diagnostic pathways to help clinicians working with COPD patients.
Sectors Chemicals,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Pharmaceuticals and Medical Biotechnology

 
Description We have developed new compatible topological characteristics of COPD which show great promise as a potential diagnostic tool to enable patients to monitor the status of their asthma and similar conditions and to communicate with their data to the GPs. This project is in the early stages of development and this will be updated in later submissions. We have established a working partnership with the ai corporation whom we advise on data analytic methodology that is relevant to their work on fraud detection.
First Year Of Impact 2018
Sector Financial Services, and Management Consultancy,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

 
Description Artificial and Augmented Intelligence for Automated Scientific Discovery
Amount £1,014,318 (GBP)
Funding ID EP/S000356/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2018 
End 06/2021
 
Description DiG for the Future: Taming disorder in self-assembled materials with topology
Amount £399,866 (GBP)
Funding ID RPG-2019-055 
Organisation The Leverhulme Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 06/2019 
End 07/2022
 
Description Knowledge Transfer Partnership
Amount £168,000 (GBP)
Funding ID 11190 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 02/2019 
End 02/2021
 
Title Data analytic modelling suite 
Description We have created a unified data analytic platform to access main tools in topological data analysis. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact New research and commercial partnerships, IAA award. 
 
Description Applied Algebraic Topology (LMS) 
Organisation Queen Mary University of London
Department School of Mathematical Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a collaborative research network established to support work in applied algebraic topology.
Collaborator Contribution We have jointly organised a number of research meetings.
Impact We have jointly organised seven research meetings to date.
Start Year 2014
 
Description KTP proposal 
Organisation Corps AI
Sector Private 
PI Contribution We have established a working partnership with ai Corporation, which investigates financial fraud for commercial clients
Collaborator Contribution Joint KTP award application, sponsorship of a PhD student
Impact KTP applications. The collaboration is multidisciplinary, combining topology and machine learning
Start Year 2017
 
Company Name TOPMD PRECISION MEDICINE LTD 
Description A company created by a member of the JTD team to exploit topological methods in medical diagnostics. 
Year Established 2019 
Impact New diagnostic software and methodology.
 
Description Cafe Scientifique talk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact I have been invited to give a popular talk based on my current research, the title was "Measuring the world: From Pythagoras to Big Data"
Year(s) Of Engagement Activity 2016
URL http://www.diverse.ip3.co.uk/scicaf.htm
 
Description Invited speaker as part of a formal working group, expert panel. Human Proteome Organisation (HUPO) annual international conference - Dublin - The Shape of Asthma 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The working group discussed the use of liquid biopsies and their use for biomarker discovery and the best practices to translate these findings to the clinic for patient benefit. It was agreed that a joint positioning paper/review would be published in collaboration with the working group members.
Year(s) Of Engagement Activity 2017
 
Description Maths in Action 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact A presentation by Jacek Brodzki on the topic "Seeing the world with mathematics" to an audience of about 800 school students and teachers.
Year(s) Of Engagement Activity 2018
URL https://thetrainingpartnership.org.uk/study-day/maths-in-action-12-12-2018/
 
Description Pint of Science 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact A popular talk by Jacek Brodzki "Is the Earth Flat?"
Year(s) Of Engagement Activity 2019
 
Description STEM for Britain 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact STEM for Britain is a national poster competition for early career scientists, mathematicians, engineers. My team have submitted a poster describing our early findings on using persistent homology to classify CT scans of lungs. We were very pleased that the poster was selected for the final, and it was presented 13 March 2017 at Westminster. The poster was presented by Dr Francisco Belchi-Guillamon and it attracted a lot of interest from the participating audience. The final was very well attended.
Year(s) Of Engagement Activity 2017
URL http://www.setforbritain.org.uk/2017event.asp
 
Description School visit (Brighton) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact An invited talk at the Rodean School in Brighton to describe the broad outcomes of our research in popular terms.
Year(s) Of Engagement Activity 2017
 
Description School visit (CoLA London) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Brodzki was invited to give a talk and to lead a workshop that involved students from four schools in London. The event was hosted by the City of London Academy in Southwark.
Year(s) Of Engagement Activity 2017
 
Description Southampton Science and Engineering Festival 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Southampton Science and Engineering Festival (SOTSEF) and engineering festival has been organised annually by the University of Southampton. The 2017 edition is the fifteenth time the festival took place and this year it was the largest event in the history of SOTSEF. It has been timed to coincide with the British Science Festival. The PI was invited to give a talk on "Measuring the World: from Pythagoras to Big Data". The open day is attended by thousands of participants from across the region, and includes school children, prospective and current students, parents and interested members of the general public. The lecture is full to capacity of the lecture theatre (about 300).
Year(s) Of Engagement Activity 2017
URL http://www.sotsef.co.uk/science-and-engineering-day/talksshows/