The Application of Multiobjective Evolutionary Algorithms in Knowledge Discovery and Data Mining of Healthcare Data

Lead Research Organisation: University of Manchester

Department Name: Computer Science

Abstract

With the increasing amount of medical data, healthcare providers are faced with both the burden of
trying to manage this vast amount of data as well as the opportunity to utilise it in order to improve
the quality of healthcare delivery. Knowledge Discovery and Data Mining (KDD) can greatly assist
with activities ranging from identifying previously unknown disease risk predictors to supporting
clinical decisions by utilising a wider range of data than a clinician is able to1. However, with a
large amount of data correlations become trivial, resulting in the need for identifying a subset of
correlations that can be clinically exploited to improve healthcare. Problems of clinical actionability
and interpretability of data mining results become issues for the real-world translation of these
techniques.
Multi-objective evolutionary algorithms (MOEAs) allow us to directly incorporate these issues into
the process of data mining, ensuring that our solutions are more suitable to the clinical setting.
From minimising costly false negatives to maximising interpretability, conflicting objectives can be
simultaneously optimised to provide an additional layer over already-successful data-driven models
for prevalent indications such as Type-2 Diabetes2. With their use in a range of mining techniques
such as association rule mining, classification, and clustering, MOEAs present a potentially fruitful
avenue of research that can be explored to create clinical decision support tools such as patient
categorisation. Real-world healthcare data will be used in this research to apply these techniques,
the results of which will guide tool implementation.

Student:

Cameron Shand

Period of Study:

Oct 15 - Dec 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1704969

Research Topic:

Unclassified

Organisations

University of Manchester (Lead Research Organisation)

People	ORCID iD
John Keane (Primary Supervisor)
Richard Allmendinger (Primary Supervisor)	http://orcid.org/0000-0003-1236-3143
Jonathan Shapiro (Primary Supervisor)
Cameron Shand (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Shand C (2018) Towards an adaptive encoding for evolutionary data clustering

Shand C (2019) Evolving controllably difficult datasets for clustering

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509565/1			01/10/2016	30/09/2021
1704969	Studentship	EP/N509565/1	01/10/2015	31/12/2019	Cameron Shand

Key Findings


Description	A primary outcome has been the creation of HAWKS, which is a synthetic data generator. The use of synthetic data is important for clustering, where it is difficult to assess the performance of algorithms (as there is typically no ground truth). Synthetic data gives us this ground truth, but typically lacks the complexity of real-world data such that it is meaningful. HAWKS was developed to allow for the user to control different aspects of complexity, such that a diverse range of synthetic data can be created. This allows for a more thorough empirical investigation of clustering algorithms, revealing their strengths and weaknesses.
Exploitation Route	The existence of a synthetic data generator for clustering permits greater insights to be made into clustering algorithms, specifically with their strengths and weaknesses. This can be useful for teaching of these algorithms (through the visualization of data where they perform both well and badly), and broadly in the use of clustering algorithms (which exists across a range of domains, as they are typically used for knowledge discovery).
Sectors	Digital/Communication/Information Technologies (including Software),Education

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects