The use of tensors for interpreting Big Data

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

My PhD project addresses the important issue of how tensors (multidimensional data structures) can be employed for making sense of Big Data recorded across numerous domains in today's connected world. Namely, the proliferation of big, multidimensional data has created an ever-growing demand for innovative techniques to process such data in a computationally efficient and physically meaningful way. The notion of tensors, a generalisation of vectors and matrices, allows for a natural organisation of multiway data into multidimensional arrays that capture their inherent structure (examples include colour images, videos, and recordings across multiple modes, such as time, subjects, and trials in physiological experiments). By organising big data in this manner, we can make use of powerful techniques, namely tensor decompositions (a generalisation of matrix factorisation) and the related tensor networks to extract latent information or to achieve super-compression, hence transforming the curse of dimensionality associated with Big Data into a blessing of dimensionality.

The objectives of the project are as follows. First, it will address how different types of tensor networks (TNs) can be leveraged for novel classes of machine learning models, which for example are expected to be capable of efficiently representing multiple feature interactions while at the same time exhibiting favourable scaling with respect to the number of features and training data. Second, it will investigate the way in which prior domain knowledge for a given task can systematically guide the design of such models, in order to match the properties of the model (i.e. TN) to the correlation structure of the given data. A third objective is to discover fundamental connections between models expressed in terms of TNs and classical deep learning (DL) architectures. This in turn can lead to insights in the field of DL as well as the derivation of new DL architectures by first analytically obtaining a TN model and subsequently mapping it to the corresponding DL architecture. The research is aligned with the EPSRC Information and Communication Technologies strategic theme and the areas of Artificial Intelligence Technologies and Digital Signal Processing.

Crucially, the first step to achieve these objectives will be to become intimately familiar with the tensor decomposition (TD) / TN literature. This includes attending relevant conferences, creating new contacts, and discussing / scrutinising ideas being put forth. As extensive and ongoing work is being performed on TNs by the quantum physics community (in a very different context), it will also be important to keep up to date with new results from that community, which could potentially be applied to big data analytics. Perceived gaps in understanding will be filled using research papers, textbooks, and lecture courses, for example at the mathematics department at Imperial. Furthermore, throughout the PhD, new ideas and results will spawn first through an analytical approach while also leveraging intuition when possible, and will then be thoroughly tested, for example on well-known benchmarks using standard validation techniques. The models will be implemented in the Python programming language using standard libraries such as scikit-learn and Keras. Finally, the generic nature of my PhD research gives it the potential to be applied across many industries and data analytics contexts, such as in financial services or e-Health, which refers to the use of electronics, communications and Internet-of-Things in a quest to revolutionise healthcare.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513052/1 01/10/2018 30/09/2023
2283715 Studentship EP/R513052/1 01/10/2019 31/03/2023 Alexandros Haliassos