Development of data mining and visualisation software for scientific analysis

Lead Research Organisation: University of Manchester
Department Name: Physics and Astronomy

Abstract

Visualisation of data in particle physics currently involves event displays, histograms, line graphs and scatterplots. Since 1975 there has been an explosion of techniques for data visualisation driven by highly interactive computer systems and ideas from statistical graphics. This field has been driven by demands for data mining of large databases and genomics. Two key areas are direct manipulation of visual data and new methods for visualising high-dimensional data. The first area has seen the use of linked views, brushing and pruning. The second area has seen the introduction of methods such as parallel coordinates and the grand tour. In this project, these ideas are applied to particle physics data to evaluate their ability to reduce data analysis time and improve pattern recognition. Visualisation will also be combined with data mining techniques. The use of visualisation to aid the understanding of the effectiveness (or not) of various data mining methods will be evaluated. There are publicly available software tools that include many of the new statistical graphics techniques. However, no single tool includes all the most powerful new techniques and much work is required to integrate these ideas into data analysis tools for particle physics. Moreover, particle physics data sets are very large and cannot be handled by many of the available tools. This work will be done in collaboration with potential users in CMS and ATLAS. The improved software tools, especially their ability to handle very large datasets, will then find application to scientific areas other than particle physics.

Publications

10 25 50
 
Description The ability to visualize data is important to gain an understanding of any system. Most data is multivariate and conventional visualizations - scatter plots, histograms, line graphs - only handle one or two dimensional data. Parallel coordinates provide are a way to visualize data in several dimensions. Up to 20 dimensions have been handled by software written to implement this technique. It has been applied successfully to a specific data analysis for the CMS experiment at CERN.
Exploitation Route The software can be used by any person who wishes to make an exploratory visual analysis of any data - scientific or commercial.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Financial Services, and Management Consultancy,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Retail

URL http://benjamin.web.cern.ch/benjamin/DataViewer.html
 
Description The software developed for this PhD was used to analyse a particular physics process for the CMS experiment at the CERN Large Hadron Collider. Work is on-going to investigate the commercial application of the software package.
First Year Of Impact 2013
Sector Education
 
Title DataViewer 
Description Developed 2009-2012 by CASE student Benjamin Radburn-Smith. Multivariate data visualization software. 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact Software and its uses are being explored by the Entrepreneurs in Transit Scheme (STFC/UMIP) Of use in any area that benefits from exploratory visual data analysis. 
URL http://benjamin.web.cern.ch/benjamin/DataViewer.html
 
Title DataViewer 
Description Software package that implements parallel coordinate algorithm for exploratory data analysis. 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed No
Impact Much interest shown by the Entrepreneur in Transit Scheme. Follow on Funding application under review.