Parikh Matrices - Complexity and Applications

Lead Research Organisation: Loughborough University
Department Name: Computer Science

Abstract

With an increasing amount of data available in the world, the need for faster data processing and faster data transmission is paramount. This usually comes in the form of a trade-off where accuracy is lowered to the benefit of a faster manipulation of the data.
The Parikh vector associated to a sequence corresponding to any data stream is an established representation that provides such a mechanism by holding information about the number of occurrences of every symbol in the sequence. More recently, this representation was enhanced to reduce the ambiguity that the previous one was exhibiting (that is, the same Parikh vector is associated to a very large number of different sequences), leading to the concept of a so-called Parikh Matrix.
Parikh matrices contain information about the number of all lexicographically increasing sub-structures of a sequence. While such a representation is still not uniquely associated to a given sequence, it preserves much of the desirable compactness of Parikh vectors, and it drastically reduces their ambiguity. Parikh matrices are not fully understood at present, and we intend to study both their capability of data modelling in application domains such as data compression, online streaming and bioinformatics, as well as their theoretical properties. One of the main problems we are interested in investigating in this project refers to the complexity of identifying the existence of a sequence associated to a Parikh matrix. The problem is over 15 years old, and the only known algorithms to solve it take time proportional to the size of the word, instead of its representation. We plan to study this question using techniques from different areas, such as algebraic geometry, and expect that such approaches will also strengthen our understanding of other basic properties as well as applications of the concept.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509516/1 01/10/2016 30/09/2021
1965712 Studentship EP/N509516/1 01/10/2017 30/06/2021 Laura Hutchinson