Functional Object Data Analysis and its Applications

Lead Research Organisation: University of Cambridge
Department Name: Pure Maths and Mathematical Statistics

Abstract

When linguists are trying to determine how different languages are related or neuroscientists wish to know how one part of the brain is associated with another, how to analyse data which is both complex and massive is a fundamental question. However, an area of Statistics, namely Functional Data Analysis, where the data is described as mathematical functions rather than numbers or vectors, has recently been shown to be very powerful in these situations.

This fellowship aims to take functional data analysis and advance it so that much more complex data can be investigated. This will require establishing a careful statistical framework for the analysis of such functions even in situations where the functions have strict relationships. By considering the underlying mathematical spaces which the functions lie in, it is possible to construct valid statistical procedures, which preserve these relationships, such as the functions needing to be positive definite or the functions needing to be related by a graph or network.

As an example, comparison between different languages (for example, how is French quantitatively different from Italian) can be carried out in the framework of functional data but not without considering specifically how the data should be analysed to take into account its particular properties. For example in trying to find a path from one language to another, it would be sensible to try to only go via other feasible acoustic sounds. This turns out to be mathematically related to shape analysis, a simple example of which might be how to describe going from London to Sydney. The shortest path is through the centre of the Earth, but this is not sensible, so you have to go round the world. Establishing links between shape analysis and functional data is a major aim of this fellowship.

In addition, most brain analysis currently splits the brain up into lots of elements know as voxels, and then analyses these voxels one by one. However, the brain is really one object (or complex 3-D object) which should be analysed together. This is another example of functional data and the methods developed in this fellowship will enable the analysis of the brain as a single object. This will be done by examining the types of dependence between observations in brain imaging data, and using these to build such an object. Of particular interest will be the analysis of brain connections resulting from particular tasks which will require a mixture of functional data analysis and graphical or network analysis. However, before this can be done and the resulting insights into the brain found, the statistical methods required to do this need to be developed.

Planned Impact

There are three main areas of impact from this fellowship.

Firstly there are the academic disciplines associated with the projects within the fellowship. The main academic beneficiary will be statistical science, not simply those working on Functional Data Analysis, but also those in the other areas of shape analysis and network or graphical modelling, as well as applied statisticians making use of methodology in these areas. However, those working in the application areas under investigation will also greatly benefit from the methodology developed. This is evidenced by the strong support shown by the three non-statisticians who have written letters to say that they feel the research will benefit their groups and disciplines more generally.

Secondly, there are the non academic counterparts of those mentioned above. Statistics is a discipline where statisticians working outside academia outnumber those within, and the methodology developed here will be of benefit to these statisticians in government and industry as well, particularly as software development is a critical part of the dissemination of this research. For clinical neuroscientists and speech technologists, the resulting application driven research that will be undertaken will also be of considerable use, and dissemination through mediums such as subject specific journals will allow them to make use of the results.

Finally, there is the wider public engagement that a project of this kind will engender. The ideas of using statistics to determine linguistic relations such that ancient languages can be reheard or so that the function of the brain in particular settings can be understood is something that will naturally be of wide interest, given the public fascination with science. Making full use of this engagement opportunity through webcasts and public lectures will help to engage the public in the work undertaken.

Publications

10 25 50
 
Description We have made it feasible to examine the variation of brain imaging signals on the cortical surface, analyse data from the Office for National Statistics in a completely new way, and to generate synthetic sounds from ancient languages. This has all be achieved by developing statistical methods for "data which is not numbers".
Exploitation Route The work has already been taken further by people in Neuroimaging, linguists and people working at the office for national statistics. Additionally new research in Forensics is being taken forward by UK Forensic Entomologists.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy

URL http://www.statslab.cam.ac.uk/~jada2
 
Description The major results from this project so far are their use to generate synthetic sounds from ancient languages which can give insights into how mathematics can play a role in humanities. In addition, work on linking functional data to time series has enabled a new benchmarking system being proposed for the Office for National Statistics. This is now being further investigated by their statisticians for possible implementation on a wide variety of economic time series. It has also been used in the thinking around the time series measures of impacts of COVID-19. Additionally, work in this project has been used to develop techniques for Forensic Entomology, and interest has been shown by the Natural History Museum (UK Forensic Entomology Unit) and the Pittsburgh Medical Examiner (USA). The work, particularly that related to brain imaging statistics, has been recognised in the Queens 2021 Birthday honours citation.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy
Impact Types Cultural,Societal,Economic,Policy & public services

 
Description Roundtable Discussion on the Bean Review of Economic Statistics
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description EPSRC Centres for Maths in Healthcare
Amount £1,923,014 (GBP)
Funding ID EP/N014588/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2016 
End 02/2020
 
Description Isaac Newton Programme on Statistical Scalability
Amount £180,000 (GBP)
Funding ID Statistical Scalability 
Organisation Isaac Newton Institute for Mathematical Sciences 
Sector Academic/University
Country United Kingdom
Start 01/2018 
End 06/2018
 
Description Kronecker Products for Imaging and Genetics
Amount £128,000 (GBP)
Organisation University of Cambridge 
Sector Academic/University
Country United Kingdom
Start 03/2017 
End 03/2019
 
Description Programme Grant
Amount £2,750,890 (GBP)
Funding ID EP/N031938/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2016 
End 05/2022
 
Title Research data supporting "The statistical analysis of acoustic phonetic data:exploring differences between spoken Romance languages" 
Description Data set of code and language data 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
 
Description Cambridge big data 
Organisation Cambridge Carbon Capture Ltd
Country United Kingdom 
Sector Private 
PI Contribution We have assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects
Collaborator Contribution Assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects. The Big data consortium has provided huge expertise and knowledge for us.
Impact A grant application, to enable better access and analysis of healthcare data has been submitted. A conference on big data in medicine is being planned for July 4th 2017. Academics have been put in contact with each other, enabling transfer of knowledge.
Start Year 2016
 
Title CovSep 
Description R-language toolbox to test separability in functional data 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact It has been released on CRAN 
URL https://cran.r-project.org/web/packages/covsep/index.html
 
Title R Package 'ftsspec' 
Description R Package for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length. 
Type Of Technology Webtool/Application 
Year Produced 2015 
Impact It has been released on CRAN. 
URL https://cran.r-project.org/web/packages/ftsspec/index.html
 
Description Big Data in Medicine Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Third sector organisations
Results and Impact A Big data in Medicine conference was co-hosted by the CMIH, attended by a variety of academics, industrial partners, students, and clinicians.
Year(s) Of Engagement Activity 2017
URL https://www.bigdata.cam.ac.uk/events/events-archive/2017-events/copy_of_big-data-in-medicine-confere...
 
Description Cambridge Research Magazine 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Cambridge Research Magazine - Research Horizons - ran an article on our research. This focused on the use of statistics and linguistics to recreate ancient languages. This was picked up by international media and the Daily Mail ran a long article on the ideas (see other entry).
Year(s) Of Engagement Activity 2016
URL https://issuu.com/uni_cambridge/docs/issue_30_research_horizons/1?e=1892280/36314892
 
Description Cambridge Science Festival 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Cambridge Science Festival is a very well attended event each year. We presented a linguistic demonstration based on speaking ancient languages and showed how this related to mathematics. We also mentioned how this could be related to other mathematical problems such as those arising in Imaging.
Year(s) Of Engagement Activity 2016
 
Description Daily Mail Article 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Daily Mail picked up the story detailed as well in the submission from the Cambridge Research Magazine. This has led to contacts from across the world about the research.
Year(s) Of Engagement Activity 2016
URL http://www.dailymail.co.uk/sciencetech/article-3698184/Listen-mother-language-Researchers-recreate-w...
 
Description Hands-On Activity at the Maths Public Open Day (Cambridge) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The proposed activity showcased how sounds (in particular, speech recordings) can be modelled and analysed. It was an interactive experience where visitors were asked to record their own imitations of a set of spoken words and they were then able to visually and quantitatively assess how well their imitation matched the sound they heard. About 60 people took part in the activity (mainly families with children) and this sparked healthy discussions about how speech can be used as data object.
Year(s) Of Engagement Activity 2016
URL http://www.sciencefestival.cam.ac.uk/events/maths-public-open-day
 
Description Interview with Cambridge TV 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Cambridge TV News broadcast an interview to members of our research team. This focused on the use of statistics and linguistics to recreate ancient languages.
Year(s) Of Engagement Activity 2016
URL http://www.cambridge-tv.co.uk/mother-tongue/
 
Description Linguistics Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We ran a workshop on statistical methods for linguists, both academic and non-academic ones, who might be interested in applying the methods we have developed in their work, including the teaching of second languages.
Year(s) Of Engagement Activity 2016
 
Description Science Museum Exhibition 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact The research from this award was the basis for a week-long exhibit at the Science Museum in London as part of the LMS Mathematics Footfall Festival. In particular we showed how mathematics and statistics can be used to investigate lost languages.
Year(s) Of Engagement Activity 2015
URL http://www.sciencemuseum.org.uk/about-us/press/nov-2015/mathematics-festival