Functional Object Data Analysis and its Applications
Lead Research Organisation:
University of Cambridge
Department Name: Pure Maths and Mathematical Statistics
Abstract
When linguists are trying to determine how different languages are related or neuroscientists wish to know how one part of the brain is associated with another, how to analyse data which is both complex and massive is a fundamental question. However, an area of Statistics, namely Functional Data Analysis, where the data is described as mathematical functions rather than numbers or vectors, has recently been shown to be very powerful in these situations.
This fellowship aims to take functional data analysis and advance it so that much more complex data can be investigated. This will require establishing a careful statistical framework for the analysis of such functions even in situations where the functions have strict relationships. By considering the underlying mathematical spaces which the functions lie in, it is possible to construct valid statistical procedures, which preserve these relationships, such as the functions needing to be positive definite or the functions needing to be related by a graph or network.
As an example, comparison between different languages (for example, how is French quantitatively different from Italian) can be carried out in the framework of functional data but not without considering specifically how the data should be analysed to take into account its particular properties. For example in trying to find a path from one language to another, it would be sensible to try to only go via other feasible acoustic sounds. This turns out to be mathematically related to shape analysis, a simple example of which might be how to describe going from London to Sydney. The shortest path is through the centre of the Earth, but this is not sensible, so you have to go round the world. Establishing links between shape analysis and functional data is a major aim of this fellowship.
In addition, most brain analysis currently splits the brain up into lots of elements know as voxels, and then analyses these voxels one by one. However, the brain is really one object (or complex 3-D object) which should be analysed together. This is another example of functional data and the methods developed in this fellowship will enable the analysis of the brain as a single object. This will be done by examining the types of dependence between observations in brain imaging data, and using these to build such an object. Of particular interest will be the analysis of brain connections resulting from particular tasks which will require a mixture of functional data analysis and graphical or network analysis. However, before this can be done and the resulting insights into the brain found, the statistical methods required to do this need to be developed.
This fellowship aims to take functional data analysis and advance it so that much more complex data can be investigated. This will require establishing a careful statistical framework for the analysis of such functions even in situations where the functions have strict relationships. By considering the underlying mathematical spaces which the functions lie in, it is possible to construct valid statistical procedures, which preserve these relationships, such as the functions needing to be positive definite or the functions needing to be related by a graph or network.
As an example, comparison between different languages (for example, how is French quantitatively different from Italian) can be carried out in the framework of functional data but not without considering specifically how the data should be analysed to take into account its particular properties. For example in trying to find a path from one language to another, it would be sensible to try to only go via other feasible acoustic sounds. This turns out to be mathematically related to shape analysis, a simple example of which might be how to describe going from London to Sydney. The shortest path is through the centre of the Earth, but this is not sensible, so you have to go round the world. Establishing links between shape analysis and functional data is a major aim of this fellowship.
In addition, most brain analysis currently splits the brain up into lots of elements know as voxels, and then analyses these voxels one by one. However, the brain is really one object (or complex 3-D object) which should be analysed together. This is another example of functional data and the methods developed in this fellowship will enable the analysis of the brain as a single object. This will be done by examining the types of dependence between observations in brain imaging data, and using these to build such an object. Of particular interest will be the analysis of brain connections resulting from particular tasks which will require a mixture of functional data analysis and graphical or network analysis. However, before this can be done and the resulting insights into the brain found, the statistical methods required to do this need to be developed.
Planned Impact
There are three main areas of impact from this fellowship.
Firstly there are the academic disciplines associated with the projects within the fellowship. The main academic beneficiary will be statistical science, not simply those working on Functional Data Analysis, but also those in the other areas of shape analysis and network or graphical modelling, as well as applied statisticians making use of methodology in these areas. However, those working in the application areas under investigation will also greatly benefit from the methodology developed. This is evidenced by the strong support shown by the three non-statisticians who have written letters to say that they feel the research will benefit their groups and disciplines more generally.
Secondly, there are the non academic counterparts of those mentioned above. Statistics is a discipline where statisticians working outside academia outnumber those within, and the methodology developed here will be of benefit to these statisticians in government and industry as well, particularly as software development is a critical part of the dissemination of this research. For clinical neuroscientists and speech technologists, the resulting application driven research that will be undertaken will also be of considerable use, and dissemination through mediums such as subject specific journals will allow them to make use of the results.
Finally, there is the wider public engagement that a project of this kind will engender. The ideas of using statistics to determine linguistic relations such that ancient languages can be reheard or so that the function of the brain in particular settings can be understood is something that will naturally be of wide interest, given the public fascination with science. Making full use of this engagement opportunity through webcasts and public lectures will help to engage the public in the work undertaken.
Firstly there are the academic disciplines associated with the projects within the fellowship. The main academic beneficiary will be statistical science, not simply those working on Functional Data Analysis, but also those in the other areas of shape analysis and network or graphical modelling, as well as applied statisticians making use of methodology in these areas. However, those working in the application areas under investigation will also greatly benefit from the methodology developed. This is evidenced by the strong support shown by the three non-statisticians who have written letters to say that they feel the research will benefit their groups and disciplines more generally.
Secondly, there are the non academic counterparts of those mentioned above. Statistics is a discipline where statisticians working outside academia outnumber those within, and the methodology developed here will be of benefit to these statisticians in government and industry as well, particularly as software development is a critical part of the dissemination of this research. For clinical neuroscientists and speech technologists, the resulting application driven research that will be undertaken will also be of considerable use, and dissemination through mediums such as subject specific journals will allow them to make use of the results.
Finally, there is the wider public engagement that a project of this kind will engender. The ideas of using statistics to determine linguistic relations such that ancient languages can be reheard or so that the function of the brain in particular settings can be understood is something that will naturally be of wide interest, given the public fascination with science. Making full use of this engagement opportunity through webcasts and public lectures will help to engage the public in the work undertaken.
People |
ORCID iD |
John Aston (Principal Investigator / Fellow) |
Publications
Zhou Y
(2016)
Toward Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach
in Journal of Computational and Graphical Statistics
Ward R
(2019)
A data-centric bottom-up model for generation of stochastic internal load profiles based on space-use type
in Journal of Building Performance Simulation
Wang. Y,
(2017)
Sparse Bayesian Multitask Learning for Subspace Segmentation
Tirlea M
(2017)
A Ticklish Problem
in Significance
Tavakoli S
(2016)
Detecting and Localizing Differences in Functional Time Series Dynamics: A Case Study in Molecular Biophysics
in Journal of the American Statistical Association
Tavakoli S
(2019)
Rejoinder for "A Spatial Modeling Approach for Linguistic Object Data: Analyzing Dialect Sound Variations Across Great Britain"
in Journal of the American Statistical Association
Tavakoli S
(2019)
A Spatial Modeling Approach for Linguistic Object Data: Analyzing Dialect Sound Variations Across Great Britain
in Journal of the American Statistical Association
Stoehr C
(2021)
Detecting changes in the covariance structure of functional time series with application to fMRI data
in Econometrics and Statistics
Description | We have made it feasible to examine the variation of brain imaging signals on the cortical surface, analyse data from the Office for National Statistics in a completely new way, and to generate synthetic sounds from ancient languages. This has all be achieved by developing statistical methods for "data which is not numbers". |
Exploitation Route | The work has already been taken further by people in Neuroimaging, linguists and people working at the office for national statistics. Additionally new research in Forensics is being taken forward by UK Forensic Entomologists. |
Sectors | Creative Economy Digital/Communication/Information Technologies (including Software) Education Healthcare Government Democracy and Justice Culture Heritage Museums and Collections Pharmaceuticals and Medical Biotechnology Security and Diplomacy |
URL | http://www.statslab.cam.ac.uk/~jada2 |
Description | The major results from this project so far are their use to generate synthetic sounds from ancient languages which can give insights into how mathematics can play a role in humanities. In addition, work on linking functional data to time series has enabled a new benchmarking system being proposed for the Office for National Statistics. This is now being further investigated by their statisticians for possible implementation on a wide variety of economic time series. It has also been used in the thinking around the time series measures of impacts of COVID-19. Additionally, work in this project has been used to develop techniques for Forensic Entomology, and interest has been shown by the Natural History Museum (UK Forensic Entomology Unit) and the Pittsburgh Medical Examiner (USA). The work, particularly that related to brain imaging statistics, has been recognised in the Queens 2021 Birthday honours citation. |
First Year Of Impact | 2016 |
Sector | Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy |
Impact Types | Cultural Societal Economic Policy & public services |
Description | Roundtable Discussion on the Bean Review of Economic Statistics |
Geographic Reach | National |
Policy Influence Type | Contribution to a national consultation/review |
Description | EPSRC Centres for Maths in Healthcare |
Amount | £1,923,014 (GBP) |
Funding ID | EP/N014588/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2016 |
End | 02/2020 |
Description | Isaac Newton Programme on Statistical Scalability |
Amount | £180,000 (GBP) |
Funding ID | Statistical Scalability |
Organisation | Isaac Newton Institute for Mathematical Sciences |
Sector | Academic/University |
Country | United Kingdom |
Start | 01/2018 |
End | 06/2018 |
Description | Kronecker Products for Imaging and Genetics |
Amount | £128,000 (GBP) |
Organisation | University of Cambridge |
Sector | Academic/University |
Country | United Kingdom |
Start | 03/2017 |
End | 03/2019 |
Description | Programme Grant |
Amount | £2,750,890 (GBP) |
Funding ID | EP/N031938/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 05/2016 |
End | 05/2022 |
Title | Research data supporting "The statistical analysis of acoustic phonetic data:exploring differences between spoken Romance languages" |
Description | Data set of code and language data |
Type Of Material | Database/Collection of data |
Provided To Others? | Yes |
Description | Cambridge big data |
Organisation | Cambridge Carbon Capture Ltd |
Country | United Kingdom |
Sector | Private |
PI Contribution | We have assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects |
Collaborator Contribution | Assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects. The Big data consortium has provided huge expertise and knowledge for us. |
Impact | A grant application, to enable better access and analysis of healthcare data has been submitted. A conference on big data in medicine is being planned for July 4th 2017. Academics have been put in contact with each other, enabling transfer of knowledge. |
Start Year | 2016 |
Title | CovSep |
Description | R-language toolbox to test separability in functional data |
Type Of Technology | Webtool/Application |
Year Produced | 2016 |
Impact | It has been released on CRAN |
URL | https://cran.r-project.org/web/packages/covsep/index.html |
Title | R Package 'ftsspec' |
Description | R Package for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length. |
Type Of Technology | Webtool/Application |
Year Produced | 2015 |
Impact | It has been released on CRAN. |
URL | https://cran.r-project.org/web/packages/ftsspec/index.html |
Description | Big Data in Medicine Conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Third sector organisations |
Results and Impact | A Big data in Medicine conference was co-hosted by the CMIH, attended by a variety of academics, industrial partners, students, and clinicians. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.bigdata.cam.ac.uk/events/events-archive/2017-events/copy_of_big-data-in-medicine-confere... |
Description | Cambridge Research Magazine |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | The Cambridge Research Magazine - Research Horizons - ran an article on our research. This focused on the use of statistics and linguistics to recreate ancient languages. This was picked up by international media and the Daily Mail ran a long article on the ideas (see other entry). |
Year(s) Of Engagement Activity | 2016 |
URL | https://issuu.com/uni_cambridge/docs/issue_30_research_horizons/1?e=1892280/36314892 |
Description | Cambridge Science Festival |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | The Cambridge Science Festival is a very well attended event each year. We presented a linguistic demonstration based on speaking ancient languages and showed how this related to mathematics. We also mentioned how this could be related to other mathematical problems such as those arising in Imaging. |
Year(s) Of Engagement Activity | 2016 |
Description | Daily Mail Article |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | The Daily Mail picked up the story detailed as well in the submission from the Cambridge Research Magazine. This has led to contacts from across the world about the research. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.dailymail.co.uk/sciencetech/article-3698184/Listen-mother-language-Researchers-recreate-w... |
Description | Hands-On Activity at the Maths Public Open Day (Cambridge) |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | The proposed activity showcased how sounds (in particular, speech recordings) can be modelled and analysed. It was an interactive experience where visitors were asked to record their own imitations of a set of spoken words and they were then able to visually and quantitatively assess how well their imitation matched the sound they heard. About 60 people took part in the activity (mainly families with children) and this sparked healthy discussions about how speech can be used as data object. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.sciencefestival.cam.ac.uk/events/maths-public-open-day |
Description | Interview with Cambridge TV |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Cambridge TV News broadcast an interview to members of our research team. This focused on the use of statistics and linguistics to recreate ancient languages. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.cambridge-tv.co.uk/mother-tongue/ |
Description | Linguistics Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | We ran a workshop on statistical methods for linguists, both academic and non-academic ones, who might be interested in applying the methods we have developed in their work, including the teaching of second languages. |
Year(s) Of Engagement Activity | 2016 |
Description | Science Museum Exhibition |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | The research from this award was the basis for a week-long exhibit at the Science Museum in London as part of the LMS Mathematics Footfall Festival. In particular we showed how mathematics and statistics can be used to investigate lost languages. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.sciencemuseum.org.uk/about-us/press/nov-2015/mathematics-festival |