Time-dependent Robust Joint Modelling: Analysing a wealth of longitudinal outliers

Lead Research Organisation: Queen's University of Belfast
Department Name: Sch of Mathematics and Physics

Abstract

Joint modelling is a sophisticated technique that allows one to simultaneously analyse the evolution, over time, of repeated measurements from individuals and the impact this has on the time to a particular event of interest. Commonly, it is applied to medical applications where patients are observed over time with the aim of investigating how and why their responses change to treatment and how this affects their survival. From this, it is evident that such approaches can be applied to a vast array of research questions, from cancer research to the analysis of chronic diseases such as heart disease, diabetes, stroke, to name but a few. As a result of this advantage, the volume of research publications utilising joint models has exploded in the last few decades.

Despite this, however, only limited research efforts have been directed at investigating one of the key assumptions of these models: that the random terms within these models follow normal distributional assumptions. This prevailing assumption of normality is detrimentally impacted when longitudinal outliers are present. Simple removal of these outliers will not only reduce sample size but, more importantly, would exclude important cases which commonly guide innovation in biomedical sciences; it is typically the analysis of outlying cases which tell us more about disease progression. Instead, this research will advance robust joint modelling techniques which both restrict the impact of outliers, providing more accurate and precise estimates to be obtained, and allow a high level of precision in the identification of such outliers for further exploration.

However, this research area is in its infancy with the volume of work to date on robust joint modelling being currently somewhat limited. This is due to the potentially restrictive assumptions of the current methodology for these models i.e. that the impact of outliers is constant, unchanging over time. There are no established theoretical tools for handling such a situation, an undesirable situation that will be rectified through this research. To do so, I will develop a novel methodology, the time-varying outlier impacts (TOI) approach, which will allow the degree at which outliers are down weighed to change over time. Doing so, will allow more realistic scenarios to be modelled using such techniques, for example, modelling patients reaction to starting a new treatment, accounting for the fact that it will take time for them to adjust to the new treatment, which could result in outlying measurements being taken from such patients or all measurements taken from the patient outlying from the trends of the population.

Another reason for limited research utilising robust joint modelling techniques is the lack of available software to fit such models. It has only been in recent years, since the introduction of the JM software package in R in 2008, that software has become available to fit standard joint models. Each of these joint modelling software packages have normal distributional assumptions for the random terms and thus cannot handle the analysis of data which contains longitudinal outliers, providing biased and imprecise estimates in the presence of outliers. This issue will also be alleviated through the work undertaken in this project through the development of a software package in R for robust joint modelling that will utilise the newly developed TOI approach.

Planned Impact

The overall goal of this research is fundamental theoretical developments in robust joint modelling methodology. As such it will have wide and far reaching academic impacts, holding the potential to revolutionise this field of research through the removal of restrictive assumptions which have halted the utilisation of robust joint models in current literature. This academic impact will not only be felt within statistics but in the multitude of application areas that such research may be utilised, for example, renal research, as will be evidenced by the findings of the proposed work. Due to the ability of statistics to impact an array of applications in medicine, geostatistics, business analytics, astrostatistics, econometrics, environmental statistics and epidemiology, to name but a few, the EPSRC has deemed 'Statistics and applied probability' as a growth research area within the theme of 'Mathematical sciences', where, as stated by the EPSRC, "this research area provides economic, industrial and societal impact".

Examples of such societal impacts arise from the gains in understanding; for example, in renal research, patients' haemoglobin levels is an emerging biomarker whose volatility in the initial stages upon commencement of haemodialysis has great impacts on the risk of death of such patients. More generally, a significant impact of this work is the ability of robust joint models to identify individuals who are classified as outliers, patients who do not react the same to treatments as typical patients and thus are in need of more personalised treatment plans. Previous research suggests that such patients tend to have worse survival rates and therefore the enhancement of techniques able to identify such at-risk patients is a key societal impact. In the long term, this has the potential to have economic impact on the NHS as a method to enable personalised medicine to become a reality.

An additional important impact of this project would be the development of the skills of the personnel involved in the project. A concern expressed by the International Review of Mathematical Sciences (IRMS) 2010 is the fragility of UK statistics due to the shortage of researchers at various career stages. This project will help to address this issue giving both the PI and PDRA the opportunity to develop skills that will stand them in good stead for their future careers in the field, benefiting greatly from the expertise and guidance received from the research visits to and from our collaborators that this project will facilitate. Added value is given through the cross disciplinary nature of this project (statistics and medicine) which will further enhance the skills of the personnel involved.
 
Description This research established the theoretical development of both robust mixed and joint models with time-varying degrees of freedom, a measure which controls the extent to which the detrimental impact of longitudinal outliers are down-weighed. In allowing this to evolve over time, a better representation of the common situation where patients take time to stabilise and adjust to new treatments is gained, enabling clinicians to better understand the processes which cause outliers. This is illustrated through several medical applications which are explored in this work. Through exploration of the properties of the corresponding estimators, this research has enhanced the theoretical understanding of the impact of longitudinal outliers. To accelerate the translation of such techniques to practical applications, an open-source R software package, "robjm", has been developed for both robust mixed and joint models.
Exploitation Route Although in its infancy, this field of research has great potential. The synergy between individuals' repeated measurements and the time until an event of interest can be found in a wide variety of applications, such as renal, cancer and genetic research, amongst others. Thus, the theoretical developments and software package created in this work will be of significance to an extensive audience. As illustrated in the analysis of renal patients undertaken in this project, such methods allow a deeper understanding of the individual-specific reaction that patients can have to new treatments. Hence, this has the potential to be a key model in the promotion of precision medicine. Alongside medical researchers, this timely work will be of interest to the growing number of academics studying mixed and joint modelling both nationally and internationally, with the open-source R software, "robjm", providing a user-friendly way to put these novel techniques into practice.
Sectors Digital/Communication/Information Technologies (including Software),Education,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Title Robust Mixed and Joint Modelling Methodology: Time-varying degrees of freedom 
Description This research established the theoretical development of both robust mixed and joint models with time-varying degrees of freedom utilising Bayesian approaches in the estimation of parameters. Previous robust mixed and joint models employed a restrictive assumption of time-constant degrees of freedom which this research shows to cause bias and inefficiency in the estimation of parameters under the realistic scenario that the impact of outliers varies over time. The open-source R software, "robjm", provides a user-friendly way to put these novel techniques into practice. 
Type Of Material Data analysis technique 
Year Produced 2018 
Provided To Others? Yes  
Impact Requests for further information and several researchers have informed us they intend to use our corresponding software package in R ("robjm" package). 
URL https://github.com/ozgurasarstat/robjm
 
Title robjm package in R 
Description Implementation of robust mixed and joint models with time-varying degrees of freedom utilising Bayesian approaches in the estimation of parameters. In addition to these new models, the package also allows estimation of mixed and joint models under a variety of assumptions: with normality assumptions for the random effects (to be utilised when no outliers are present), with time-constant degrees of freedom equal for both random effects and random error terms and with time-constant degrees of freedom differing for both random terms. The package fits mixed and robust joint models where one of the random effects or random error terms can be normally distributed with the other random term t-distributed. The package allows visualisation of the shape of the degrees of freedom when it varies over time as well as simulation of data assuming any of the models listed above. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact Several researchers have informed us they intend to use our software. 
URL https://github.com/ozgurasarstat/robjm
 
Description Jonas Wallin - Robust Joint Models with time-varying outlier impacts 
Organisation Lund University
Country Sweden 
Sector Academic/University 
PI Contribution The research completed as part of this grant has been a collaborative work involving the research team and this collaborator. Upon completion of a systematic review of the literature, the research team has taken the lead in the theoretical development of the novel methodology that underpins this research. Due to the complexity of the methods being developed, Bayesian estimation was employed. Working closely with the collaborator, the research team has worked on the development of a software package in R, "robjm", and utilised this software to conduct a simulation study to both validate the software package and the novel methodology. Due to the size of the simulation study, the High Performance Computing facilities at Queen's University Belfast are being employed to undertake this study. Utilising the expertise gained from attendance at the Royal Society training course "Introduction to Public Engagement", the research team has disseminated the findings of this work at each stage of the project. To date, this has been achieved through invited presentations at the Royal Statistical Society seminar series in Northern Ireland, conference presentations at both national and international conferences and an invited talk at a Big Data Workshop in Istanbul.
Collaborator Contribution Dr Wallin is an expert in Bayesian analysis. Due to the complexity of the methodology being developed as part of this grant, it was necessary to employ Bayesian estimation techniques. Dr Wallin provided advice and guidance on Bayesian analysis in the estimation of the robust joint model with time-varying outlier impacts that were introduced by this research.
Impact Robust Joint Models with Time-varying Degrees of Freedom. Robust joint models: The introduction of time-varying degrees of freedom. Robust joint modelling: A new approach to handle time-varying outlier impacts.
Start Year 2019
 
Title robjm package in R 
Description Implementation of robust mixed and joint models with time-varying degrees of freedom utilising Bayesian approaches in the estimation of parameters. In addition to these new models, the package also allows estimation of mixed and joint models under a variety of assumptions: with normality assumptions for the random effects (to be utilised when no outliers are present), with time-constant degrees of freedom equal for both random effects and random error terms and with time-constant degrees of freedom differing for both random terms. The package fits mixed and robust joint models where one of the random effects or random error terms can be normally distributed with the other random term t-distributed. The package allows visualisation of the shape of the degrees of freedom when it varies over time as well as simulation of data assuming any of the models listed above. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Several researchers have informed us they intend to use our software. 
URL https://github.com/ozgurasarstat/robjm
 
Description Big Data Workshop (Istanbul, June 2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact This was an invitation to present an invited talk at a one day workshop in Istanbul on statistical theory and applications. It consisted of a combination of lectures and invited talks. As the audience consisted of people from both industry and academia, this allowed wider dissemination of the newly developed statistical methodology and the analysis of real world data utilising these techniques.
Year(s) Of Engagement Activity 2018
 
Description Royal Statistical Society Northern Ireland Local Group Seminar Series (April 2018) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact This was an invited seminar as part of the Royal Statistical Society Northern Ireland Local Group Seminar Series. This was intended to disseminate the initial research findings with regards to both the statistical methodology and the analysis of real world data, inspiring people both in industry and academia to utilise the newly developed approaches. Being invited to present this talk provided a great opportunity to ensure that the theoretical developments and application findings uncovered so far reach a wide audience. In particular, that our findings were presented to both those in academia who focus on statistical methodology alongside those in industry for whom these methods and the medical insights gained would be of great interest. The talk was followed by a good discussion around the research findings.
Year(s) Of Engagement Activity 2018
 
Description Royal Statistical Society Northern Ireland Local Group Seminar Series (Dec 2018) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact This was an invited seminar as part of the Royal Statistical Society Northern Ireland Local Group Seminar Series which discussed the latest findings of the research and introduced the "robjm" software package in R that is being developed through this grant. People from both industry and academia were in attendance giving a great opportunity to ensure that the statistical methodology and the analysis of real world data was disseminated to a wide audience. The talk was followed by a good discussion around the research findings with suggestions for possible avenues for further enhancement of the work being given.
Year(s) Of Engagement Activity 2018