Advanced insurance claim analytics

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The modern era produces large amounts of unstructured (non-tabular, e.g. images, natural language) data that is typically processed by humans as part of the delivery of goods and services. The power of machine learning (ML) is in the ability of algorithms to computationally model these data and enable the same services to be automatically delivered. This leverages the advantages of computation over human processing, i.e. decisions are quantitative and objective (in the scope of the algorithm developer's biases), never needing breaks, can process more requests in less time, easily scalable to demand.

This research is performed in the context of potential applications of ML in the consumer car insurance industry, with a specific view on attempting to predict claim settlement values based on the data available to our industry partner, Tesco Underwriting and Dunhumby.

The aim of the research is to use this novel (to academia) insurance claim data from our industry partners, Tesco Underwriting and Dunhumby, as a lens to investigate the performance and modification of existing methodologies and inspire novel methodologies. The potential impact of such research is a new dataset to evaluate existing methodologies on; the discovery of novel methodological strategies; and potential insight into why the algorithms researched do what they do.

This project falls within the EPSRC Digital Economy research area because the nature of the data provided specifically allows novel exploration of the following themes:

How to fuse data, such as images and text (of varied quantity and quality) associated with an insurance claim, in order to improve predictions and compare to existing methodologies applied in isolation to the data modalities.

This is relevant for modern data consumption by industry as many data generating processes generate varied data that could potentially be combined/fused to give insights that are greater than the sum of an analysis on its parts.

To investigate interpretability and uncertainty quantification in order to give more confidence to end users about the reason for and stability of predictions from modern machine learning algorithms.

This is relevant for migrating sceptical companies to the use of advanced machine learning, to keep the UK at the cutting edge of digital commerce. This is because there is hesitance in acceptance of modern machine learning methodologies, especially in compliance heavy industries like insurance, due to the un-interpretable black-box nature of the algorithms.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2605889 Studentship EP/S023151/1 03/10/2020 30/09/2024 Alexander Larionov