131984Cluster Inference of Big Multimedia DatabasesClosedFeasibility StudiesInnovate UKClustering data into groups so that each group is characterised by a set of unique features is of great significance in many applications. Although this is a well-studied area, a fundamental problem remains unsolved: clustering databases with the Number of Clusters far greater than the average Size of Clusters (or “NC >> SC” problem for short). The technical challenge of this problem arises from the fact that the intra-cluster similarity of some clusters is likely to be greater than the inter-cluster similarity of some other clusters. As a result, in the feature space, the clusters significantly overlap, making the clustering unreliable. A typical example is INTERPOL ‘s International Child Sexual Exploitation Database (ICSE DB) with millions of photos. Because often the metadata (e.g., camera model) of the photos is manipulated, the police expect that there may be hundreds of thousands of cameras responsible for the creation of those photos, each responsible for only a small number of photos. The objective of this project is to develop a heuristic clustering method to cluster those photos so that each cluster corresponds to one camera.