# Functional Object Data Analysis and its Applications

University of Cambridge
Department Name: Pure Maths and Mathematical Statistics

### Abstract

When linguists are trying to determine how different languages are related or neuroscientists wish to know how one part of the brain is associated with another, how to analyse data which is both complex and massive is a fundamental question. However, an area of Statistics, namely Functional Data Analysis, where the data is described as mathematical functions rather than numbers or vectors, has recently been shown to be very powerful in these situations.

This fellowship aims to take functional data analysis and advance it so that much more complex data can be investigated. This will require establishing a careful statistical framework for the analysis of such functions even in situations where the functions have strict relationships. By considering the underlying mathematical spaces which the functions lie in, it is possible to construct valid statistical procedures, which preserve these relationships, such as the functions needing to be positive definite or the functions needing to be related by a graph or network.

As an example, comparison between different languages (for example, how is French quantitatively different from Italian) can be carried out in the framework of functional data but not without considering specifically how the data should be analysed to take into account its particular properties. For example in trying to find a path from one language to another, it would be sensible to try to only go via other feasible acoustic sounds. This turns out to be mathematically related to shape analysis, a simple example of which might be how to describe going from London to Sydney. The shortest path is through the centre of the Earth, but this is not sensible, so you have to go round the world. Establishing links between shape analysis and functional data is a major aim of this fellowship.

In addition, most brain analysis currently splits the brain up into lots of elements know as voxels, and then analyses these voxels one by one. However, the brain is really one object (or complex 3-D object) which should be analysed together. This is another example of functional data and the methods developed in this fellowship will enable the analysis of the brain as a single object. This will be done by examining the types of dependence between observations in brain imaging data, and using these to build such an object. Of particular interest will be the analysis of brain connections resulting from particular tasks which will require a mixture of functional data analysis and graphical or network analysis. However, before this can be done and the resulting insights into the brain found, the statistical methods required to do this need to be developed.

### Planned Impact

There are three main areas of impact from this fellowship.

Firstly there are the academic disciplines associated with the projects within the fellowship. The main academic beneficiary will be statistical science, not simply those working on Functional Data Analysis, but also those in the other areas of shape analysis and network or graphical modelling, as well as applied statisticians making use of methodology in these areas. However, those working in the application areas under investigation will also greatly benefit from the methodology developed. This is evidenced by the strong support shown by the three non-statisticians who have written letters to say that they feel the research will benefit their groups and disciplines more generally.

Secondly, there are the non academic counterparts of those mentioned above. Statistics is a discipline where statisticians working outside academia outnumber those within, and the methodology developed here will be of benefit to these statisticians in government and industry as well, particularly as software development is a critical part of the dissemination of this research. For clinical neuroscientists and speech technologists, the resulting application driven research that will be undertaken will also be of considerable use, and dissemination through mediums such as subject specific journals will allow them to make use of the results.

Finally, there is the wider public engagement that a project of this kind will engender. The ideas of using statistics to determine linguistic relations such that ancient languages can be reheard or so that the function of the brain in particular settings can be understood is something that will naturally be of wide interest, given the public fascination with science. Making full use of this engagement opportunity through webcasts and public lectures will help to engage the public in the work undertaken.

Description We have made it feasible to examine the variation of brain imaging signals on the cortical surface, analyse data from the Office for National Statistics in a completely new way, and to generate synthetic sounds from ancient languages. This has all be achieved by developing statistical methods for "data which is not numbers".
Exploitation Route The work has already been taken further by people in Neuroimaging, linguists and people working at the office for national statistics. Additionally new research in Forensics is being taken forward by UK Forensic Entomologists.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy

Description The major results from this project so far are their use to generate synthetic sounds from ancient languages which can give insights into how mathematics can play a role in humanities. In addition, work on linking functional data to time series has enabled a new benchmarking system being proposed for the Office for National Statistics. This is now being further investigated by their statisticians for possible implementation on a wide variety of economic time series. It has also been used in the thinking around the time series measures of impacts of COVID-19. Additionally, work in this project has been used to develop techniques for Forensic Entomology, and interest has been shown by the Natural History Museum (UK Forensic Entomology Unit) and the Pittsburgh Medical Examiner (USA).
Sector Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy
Impact Types Cultural,Societal,Economic,Policy & public services

Description Roundtable Discussion on the Bean Review of Economic Statistics
Geographic Reach National
Policy Influence Type Participation in a national consultation

Title Research data supporting "The statistical analysis of acoustic phonetic data:exploring differences between spoken Romance languages"
Description Data set of code and language data
Type Of Material Database/Collection of data
Provided To Others? Yes

Description Cambridge big data
Organisation Cambridge Carbon Capture Ltd
Country United Kingdom
Sector Private
PI Contribution We have assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects
Collaborator Contribution Assisted in putting together research bids and a conference, as well as networking with others, and creating new collaborative projects. The Big data consortium has provided huge expertise and knowledge for us.
Impact A grant application, to enable better access and analysis of healthcare data has been submitted. A conference on big data in medicine is being planned for July 4th 2017. Academics have been put in contact with each other, enabling transfer of knowledge.
Start Year 2016

Title CovSep
Description R-language toolbox to test separability in functional data
Type Of Technology Webtool/Application
Year Produced 2016
Impact It has been released on CRAN
URL https://cran.r-project.org/web/packages/covsep/index.html

Title R Package 'ftsspec'
Description R Package for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.
Type Of Technology Webtool/Application
Year Produced 2015
Impact It has been released on CRAN.
URL https://cran.r-project.org/web/packages/ftsspec/index.html

Description Big Data in Medicine Conference
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Third sector organisations
Results and Impact A Big data in Medicine conference was co-hosted by the CMIH, attended by a variety of academics, industrial partners, students, and clinicians.
Year(s) Of Engagement Activity 2017
URL https://www.bigdata.cam.ac.uk/events/events-archive/2017-events/copy_of_big-data-in-medicine-confere...

Description Cambridge Research Magazine
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Cambridge Research Magazine - Research Horizons - ran an article on our research. This focused on the use of statistics and linguistics to recreate ancient languages. This was picked up by international media and the Daily Mail ran a long article on the ideas (see other entry).
Year(s) Of Engagement Activity 2016
URL https://issuu.com/uni_cambridge/docs/issue_30_research_horizons/1?e=1892280/36314892

Description Cambridge Science Festival
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Cambridge Science Festival is a very well attended event each year. We presented a linguistic demonstration based on speaking ancient languages and showed how this related to mathematics. We also mentioned how this could be related to other mathematical problems such as those arising in Imaging.
Year(s) Of Engagement Activity 2016

Description Daily Mail Article
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Daily Mail picked up the story detailed as well in the submission from the Cambridge Research Magazine. This has led to contacts from across the world about the research.
Year(s) Of Engagement Activity 2016
URL http://www.dailymail.co.uk/sciencetech/article-3698184/Listen-mother-language-Researchers-recreate-w...

Description Hands-On Activity at the Maths Public Open Day (Cambridge)
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The proposed activity showcased how sounds (in particular, speech recordings) can be modelled and analysed. It was an interactive experience where visitors were asked to record their own imitations of a set of spoken words and they were then able to visually and quantitatively assess how well their imitation matched the sound they heard. About 60 people took part in the activity (mainly families with children) and this sparked healthy discussions about how speech can be used as data object.
Year(s) Of Engagement Activity 2016
URL http://www.sciencefestival.cam.ac.uk/events/maths-public-open-day

Description Interview with Cambridge TV
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Cambridge TV News broadcast an interview to members of our research team. This focused on the use of statistics and linguistics to recreate ancient languages.
Year(s) Of Engagement Activity 2016
URL http://www.cambridge-tv.co.uk/mother-tongue/

Description Linguistics Workshop
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We ran a workshop on statistical methods for linguists, both academic and non-academic ones, who might be interested in applying the methods we have developed in their work, including the teaching of second languages.
Year(s) Of Engagement Activity 2016

Description Science Museum Exhibition
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact The research from this award was the basis for a week-long exhibit at the Science Museum in London as part of the LMS Mathematics Footfall Festival. In particular we showed how mathematics and statistics can be used to investigate lost languages.
Year(s) Of Engagement Activity 2015