Nonparametric regression on implicit manifold of high dimensional point cloud

Lead Research Organisation: University of Glasgow
Department Name: School of Mathematics & Statistics

Abstract

In a variety of fields, from biology, life science, to environmental science, one often encounters high-dimensional data (e.g., 'point cloud data') perturbed by some high-dimensional noise but cantering around some lower-dimensional manifolds. In more precise mathematical terms, manifolds are topological spaces equipped with some differential/smooth structure, the geometry of which is in general different from the usual Euclidean geometry. Naively applying traditional multivariate analysis to manifold-valued data that ignores the geometry of the space can potentially lead to highly misleading predictions and inferences. There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. Gaussian Processes (GP) are among the most powerful tools in statistics and machine learning for regression and optimisation with Euclidean predictors. However, if the predictors locate on a manifold, traditional smoothing or modelling methods like GP that do not respect the intrinsic geometry of the space and the boundary constraints, produce poor results. One of the paramount challenges in developing GP models on manifold is the difficulty in specifying the covariance structure via constructing valid and computable covariance kernels on manifolds. In PI's prior and recent work, the heat kernel is employed to construct the intrinsic Gaussian process (In-GP) on complex constrained domain. Although the heat kernel provides a natural and canonical choice theoretically, it is analytically intractable to directly evaluate. Alternatively, it can be estimated as the transition density of the Brownian Motion (BM) on the manifold. This preliminary work has been applied to model the chlorophyll concentration levels in Aral Sea. This method is only applicable when the geometry of the manifold is known. For example, the mapping from longitude and latitude to the sphere in R3 is known. However, in modern big data analysis, the data in the point cloud, often high dimensional, are not directly observed on the manifolds, instead they are observed in a potentially high-dimensional ambient space but concentrate around some unknown lower dimensional structure. One needs to learn this geometry before utilizing it for inference.

The objective of this proposal is to fill a critical gap in model structure and inference for undefined manifolds in high dimension point clouds, by constructing the intrinsic Gaussian processes. We will use the Bayesian dimension reduction method to learn the implicit manifold. The Bayesian models are extremely important for uncertainty quantification. The heat kernel can be estimated as the transition density of the BM on the learned manifold. This framework will allow us to build the In-GP regression models on point cloud. The intrinsic GP on point cloud can incorporate fully the intrinsic geometry of the learned manifold for inference while respecting the potentially complex interior structure and boundary.

Publications

10 25 50
 
Description We have developed a novel approach to construct Intrinsic Gaussian Processes for regression on unknown manifolds with probabilistic metric in point clouds. In many real world applications, one often encounters high dimensional data (e.g.'point cloud data') centered around some lower dimensional unknown manifolds. The geometry of manifold is in general different from the usual Euclidean geometry. Naively applying traditional smoothing methods such as Euclidean Gaussian Processes (GPs) to manifold-valued data and so ignoring the geometry of the space can potentially lead to highly misleading pre-
dictions and inferences. The applications of this new approach are illustrated in the high dimensional real datasets of WiFi signals and
image data examples.
Exploitation Route The novel approach we developed can be further used for optimisation on complex constrained domains and point clouds. For example, it can be used to find the hotspot of water contamination in a lake or some waterbody where the boundary is complicated and concave.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment

 
Description Our method has been used in Indoor location estimation. Indoor wireless signals from devices such as WiFi access points can be exploited for location estimation. A series of WiFi signal strength traces are collected by a mobile device which travels in a one floor university building. Our method has been used to estimated the location of the mobile device based on the high dimensional WiFi signal. We also have applications in diffusion tensor imaging (DTI), designed to measure the diffusion of water molecules in the brain. Our method is used to build predictive models of cognitive traits and neuropsychiatric disorders. Our results shown the difference between the positive (HIV+) and control samples.
First Year Of Impact 2023
Sector Healthcare,Manufacturing, including Industrial Biotechology