The Application of Multiobjective Evolutionary Algorithms in Knowledge Discovery and Data Mining of Healthcare Data

Lead Research Organisation: University of Manchester
Department Name: Computer Science


With the increasing amount of medical data, healthcare providers are faced with both the burden of
trying to manage this vast amount of data as well as the opportunity to utilise it in order to improve
the quality of healthcare delivery. Knowledge Discovery and Data Mining (KDD) can greatly assist
with activities ranging from identifying previously unknown disease risk predictors to supporting
clinical decisions by utilising a wider range of data than a clinician is able to1. However, with a
large amount of data correlations become trivial, resulting in the need for identifying a subset of
correlations that can be clinically exploited to improve healthcare. Problems of clinical actionability
and interpretability of data mining results become issues for the real-world translation of these
Multi-objective evolutionary algorithms (MOEAs) allow us to directly incorporate these issues into
the process of data mining, ensuring that our solutions are more suitable to the clinical setting.
From minimising costly false negatives to maximising interpretability, conflicting objectives can be
simultaneously optimised to provide an additional layer over already-successful data-driven models
for prevalent indications such as Type-2 Diabetes2. With their use in a range of mining techniques
such as association rule mining, classification, and clustering, MOEAs present a potentially fruitful
avenue of research that can be explored to create clinical decision support tools such as patient
categorisation. Real-world healthcare data will be used in this research to apply these techniques,
the results of which will guide tool implementation.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509565/1 01/10/2016 30/09/2021
1704969 Studentship EP/N509565/1 22/09/2015 31/12/2019 Cameron Shand
Description A primary outcome has been the creation of HAWKS, which is a synthetic data generator. The use of synthetic data is important for clustering, where it is difficult to assess the performance of algorithms (as there is typically no ground truth). Synthetic data gives us this ground truth, but typically lacks the complexity of real-world data such that it is meaningful. HAWKS was developed to allow for the user to control different aspects of complexity, such that a diverse range of synthetic data can be created. This allows for a more thorough empirical investigation of clustering algorithms, revealing their strengths and weaknesses.
Exploitation Route The existence of a synthetic data generator for clustering permits greater insights to be made into clustering algorithms, specifically with their strengths and weaknesses. This can be useful for teaching of these algorithms (through the visualization of data where they perform both well and badly), and broadly in the use of clustering algorithms (which exists across a range of domains, as they are typically used for knowledge discovery).
Sectors Digital/Communication/Information Technologies (including Software),Education