Estimating features of trajectories in repeated measures data
Lead Research Organisation:
University of Bristol
Department Name: Social Medicine
Abstract
Investigations of a disease or condition commonly involve repeated measurement of important biological values over time on a group or cohort of individuals with that condition. For example, after men with prostate cancer receive treatment for their disease, they are monitored to check that their cancer has not returned. Prostate specific antigen (PSA) is repeatedly measured through blood tests every few months. PSA is produced in the prostate but, vitally, more PSA is produced by tumour cells than healthy prostate cells so that a high value of PSA is a worrying sign for cancer recurrence. The idea behind monitoring these people is that an outcome, such as having a recurrence of cancer or dying from a disease, may occur at some point in the future. Knowing the outcome, we can investigate how the biological measurements over time differ for those who have one outcome (e.g. die from prostate cancer) compared to those who experience another (e.g. survive).
In many examples, these repeated measurements can display complex changes when we look at a plot of the measurements over time. In the prostate cancer example, PSA is well known to have high day to day variation such that repeated measures every few months will produce plots which show highly nonlinear changes in PSA over time. One problem is how best to summarise such repeated measurements. Some people look only at the average value of all of the measurements and ignore the fact that they were measured over time. Some believe that the most recent value is the only important one and that this should be used as a summary. Neither of these methods gives a sense of the behaviour of the measurements for an individual over time. I will be developing new methods to summarise the behavior in terms of certain features of the trend or trajectory. For example, I will look at the speed at which the measurements are changing, and the age at which the fastest change in measurement is found. Looking at better summaries of behaviour may allow us to better predict outcomes in the individuals, such as prostate cancer recurrence in the example of repeated PSA measurement.
The research will be applied in four examples, namely HIV infection, prostate cancer, blood pressure changes during pregnancy and in adolescents. Data on these four examples have already been collected as part of large group or cohort studies. I will be analysing the data with statistical software and developing methods which can be tested in that software. If these features do tell us about future outcomes it will be possible to alert and intervene in cases where a dangerous feature is found during routine clinical monitoring.
In many examples, these repeated measurements can display complex changes when we look at a plot of the measurements over time. In the prostate cancer example, PSA is well known to have high day to day variation such that repeated measures every few months will produce plots which show highly nonlinear changes in PSA over time. One problem is how best to summarise such repeated measurements. Some people look only at the average value of all of the measurements and ignore the fact that they were measured over time. Some believe that the most recent value is the only important one and that this should be used as a summary. Neither of these methods gives a sense of the behaviour of the measurements for an individual over time. I will be developing new methods to summarise the behavior in terms of certain features of the trend or trajectory. For example, I will look at the speed at which the measurements are changing, and the age at which the fastest change in measurement is found. Looking at better summaries of behaviour may allow us to better predict outcomes in the individuals, such as prostate cancer recurrence in the example of repeated PSA measurement.
The research will be applied in four examples, namely HIV infection, prostate cancer, blood pressure changes during pregnancy and in adolescents. Data on these four examples have already been collected as part of large group or cohort studies. I will be analysing the data with statistical software and developing methods which can be tested in that software. If these features do tell us about future outcomes it will be possible to alert and intervene in cases where a dangerous feature is found during routine clinical monitoring.
Technical Summary
Aim
I aim to develop methods to obtain features of trajectories from repeated measures data and relate these features to outcomes.
Objectives
Derivative estimates will be developed for several methods for analysing trajectories of repeated, non-linear data, and used to estimate features of a trajectory. I will develop inverse prediction intervals to incorporate uncertainty around the timing of a feature. A 2-stage model for the association between a feature and outcome (i.e. estimate a feature, then in a separate model relate it to the outcome) will be implemented. Finally, a method for jointly estimating a feature and outcome will be developed.
Methodology
The methods which I will consider for handling repeated measures data are P-Spline and semiparametric mixed models, functional Principal components Analysis through Conditional Expectation (PACE) and Bayesian direct gradient estimation.
Scientific/Medical Opportunities
Derivative estimation, inverse prediction and bivariate modelling will be key developments in longitudinal data analysis. In a range of disciplines it is often the case that the rate of change of observed data is of primary interest. Predicting the value of an explanatory variable at a given value of response is a very general problem in statistics. Intervals around such an estimate will be a valuable tool in repeated measures analysis. Simultaneous estimation of a feature and outcome is an approach linking repeated measurements with an outcome variable. Correct specification of the correlation between the feature and outcome is vital.
Four applications will be used to demonstrate the methods and direct conclusions from these will be valuable to the relevant clinical setting. Where novel insights are gained by using features of trajectories this will directly benefit that medical area. Further, these methods can be used in any application of repeated measures and I will promote this with dissemination throughout the fellowship.
I aim to develop methods to obtain features of trajectories from repeated measures data and relate these features to outcomes.
Objectives
Derivative estimates will be developed for several methods for analysing trajectories of repeated, non-linear data, and used to estimate features of a trajectory. I will develop inverse prediction intervals to incorporate uncertainty around the timing of a feature. A 2-stage model for the association between a feature and outcome (i.e. estimate a feature, then in a separate model relate it to the outcome) will be implemented. Finally, a method for jointly estimating a feature and outcome will be developed.
Methodology
The methods which I will consider for handling repeated measures data are P-Spline and semiparametric mixed models, functional Principal components Analysis through Conditional Expectation (PACE) and Bayesian direct gradient estimation.
Scientific/Medical Opportunities
Derivative estimation, inverse prediction and bivariate modelling will be key developments in longitudinal data analysis. In a range of disciplines it is often the case that the rate of change of observed data is of primary interest. Predicting the value of an explanatory variable at a given value of response is a very general problem in statistics. Intervals around such an estimate will be a valuable tool in repeated measures analysis. Simultaneous estimation of a feature and outcome is an approach linking repeated measurements with an outcome variable. Correct specification of the correlation between the feature and outcome is vital.
Four applications will be used to demonstrate the methods and direct conclusions from these will be valuable to the relevant clinical setting. Where novel insights are gained by using features of trajectories this will directly benefit that medical area. Further, these methods can be used in any application of repeated measures and I will promote this with dissemination throughout the fellowship.
Planned Impact
The three main developments of this fellowship, namely derivative estimation for repeated measures data, inverse prediction and bivariate estimation will each prove useful in the biostatistics community. Derivative estimation is becoming a larger element of analyses as our need to understand change grows with the collection of more and better data. The methods proposed here are at the forefront of this understanding. This research will lead to methods which can be applied in new areas without the need for new data to be collected. By using features of repeated measures as risk factors, I hope to gain a novel insight into the applications described in my Case for Support. This will have a multifaceted impact.
First, in the application itself if a feature of, say, PSA after treatment, is associated with recurrence of prostate cancer, then this may be used prospectively in the monitoring of patients post surgery or radiotherapy. This will potentially lead to earlier intervention by way of further treatment which will lead to cost savings and improved quality of life. The direct beneficiaries of this example would be men with recurrent prostate cancer and their families. Further, clinicians in charge of monitoring men post treatment will have an improved knowledge base on which to recommend further action. During the course of my current post I developed an Excel system which is designed for use by urologists in monitoring men on active surveillance for localised prostate cancer. This idea of an easy to use online or software package is certainly one which can be implemented given successful results from this fellowship. This could potentially impact on NHS policy where a feature has a strong association as a risk factor and is then built into such a system which is tested and rolled out to GPs or consultants. Obviously these impacts would be long term and would be achieved only after sufficient testing not included as part of the proposed fellowship.
Second, in the wider UK academic research community the impact of a successful application of these methods would be the increased awareness of analysing repeated measures in this way. The methods are not restricted to these four applications and through proper dissemination in journals, conferences and short courses the use of features could be employed in biomedical applications where repeated measures are regularly taken and in emerging fields such as epigenetics. Further, outside of public health, the methods could be used in finance, sports science and other environments where data is routinely and repeatedly collected on the same unit or individual. To enhance this transfer of knowledge, I will write tutorial style papers which will, along with attached software, maximise the impact of this fellowship.
Finally, through the University of Bristol Centre for Public Engagement (CPE), a novel insight into prostate cancer, for instance, can be relayed to the general public. At the moment, prostate cancer is receiving a lot of publicity and new advancements in the area are constantly being welcomed. The CPE will allow any potential applied impact of the research to be demonstrated to those who will benefit the most.
First, in the application itself if a feature of, say, PSA after treatment, is associated with recurrence of prostate cancer, then this may be used prospectively in the monitoring of patients post surgery or radiotherapy. This will potentially lead to earlier intervention by way of further treatment which will lead to cost savings and improved quality of life. The direct beneficiaries of this example would be men with recurrent prostate cancer and their families. Further, clinicians in charge of monitoring men post treatment will have an improved knowledge base on which to recommend further action. During the course of my current post I developed an Excel system which is designed for use by urologists in monitoring men on active surveillance for localised prostate cancer. This idea of an easy to use online or software package is certainly one which can be implemented given successful results from this fellowship. This could potentially impact on NHS policy where a feature has a strong association as a risk factor and is then built into such a system which is tested and rolled out to GPs or consultants. Obviously these impacts would be long term and would be achieved only after sufficient testing not included as part of the proposed fellowship.
Second, in the wider UK academic research community the impact of a successful application of these methods would be the increased awareness of analysing repeated measures in this way. The methods are not restricted to these four applications and through proper dissemination in journals, conferences and short courses the use of features could be employed in biomedical applications where repeated measures are regularly taken and in emerging fields such as epigenetics. Further, outside of public health, the methods could be used in finance, sports science and other environments where data is routinely and repeatedly collected on the same unit or individual. To enhance this transfer of knowledge, I will write tutorial style papers which will, along with attached software, maximise the impact of this fellowship.
Finally, through the University of Bristol Centre for Public Engagement (CPE), a novel insight into prostate cancer, for instance, can be relayed to the general public. At the moment, prostate cancer is receiving a lot of publicity and new advancements in the area are constantly being welcomed. The CPE will allow any potential applied impact of the research to be demonstrated to those who will benefit the most.
People |
ORCID iD |
Andrew Simpkin (Principal Investigator / Fellow) |
Publications
Simpkin A
(2017)
Modelling height in adolescence: a comparison of methods for estimating the age at peak height velocity
in Annals of Human Biology
Simpkin AJ
(2016)
Prostate-specific antigen patterns in US and European populations: comparison of six diverse cohorts.
in BJU international
Simpkin AJ
(2017)
The epigenetic clock and physical development during childhood and adolescence: longitudinal analysis from a UK birth cohort.
in International journal of epidemiology
Simpkin AJ
(2015)
Longitudinal analysis of DNA methylation associated with birth weight and gestational age.
in Human molecular genetics
Simpkin AJ
(2016)
Prenatal and early life influences on epigenetic age in children: a study of mother-offspring pairs from two cohort studies.
in Human molecular genetics
Simpkin AJ
(2015)
Systematic Review and Meta-analysis of Factors Determining Change to Radical Treatment in Active Surveillance for Localized Prostate Cancer.
in European urology
Taylor M
(2016)
Exploration of a Polygenic Risk Score for Alcohol Consumption: A Longitudinal Analysis from the ALSPAC Cohort.
in PloS one
Description | Developing inverse prediction for functional data analysis |
Organisation | Columbia University Medical Center |
Country | United States |
Sector | Academic/University |
PI Contribution | Together with Dr. Sara Lopez-Pintado and Prof. Ian McKeague, I have developed an approach to prediction intervals for the timing of a feature of functional data. In short, biomedical data which are repeatedly measured on the same individuals often take nonlinear trajectories rather than following simple linear behaviour. My fellowship aims to summarise these trajectories in terms of key features and the timing of these features (e.g. the timing of a peak in the biomedical data over time). We have developed a prediction interval around the timing of these features, i.e. an interval on the x-axis. |
Collaborator Contribution | Dr. Lopez-Pintado and Prof. McKeague were responsible for overseeing my development of the method. R was then employed to test the approach empirically on both simulated and real data. |
Impact | We have submitted an abstract for Oral presentation at the International Biometrics Conference in July, 2016. We are currently drafting a manuscript for submission to the Biometrics journal, describing and testing the developed method. |
Start Year | 2015 |
Description | Conference on Applied Statistics in Ireland, Cork |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I was accepted for an oral presentation at the largest statistics conference in Ireland, CASI which took place in Cork in May 2015. Roughly 150 were in attendance and there was a healthy question and answer session afterwards. This talk led to collaboration with Dr. Caroline Brophy on data which were relevant to the MRC fellowship. |
Year(s) Of Engagement Activity | 2015 |
Description | IBC 2016 - Biostatistics conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Talk given to the international biometrics conference in Victoria, Canada. Excellent discussion led to ideas for future collaboration and application. |
Year(s) Of Engagement Activity | 2016 |
Description | International Society for Clinical Biostatistics Conference, Utrecht, Netherlands |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I was accepted for an oral presentation at the International Society for Clinical Biostatistics Conference in Utrecht (Netherlands) in August. Roughly 100 people were in attendance, these were statisticians from International Universities and industry. The talk was followed by excellent questions and discussion, which has led to new working links with Prof. Paul Eilers and Prof. Jutta Gampe. |
Year(s) Of Engagement Activity | 2015 |
Description | Invited seminar, Columbia University |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I was invited to present my fellowship work to the functional data analysis workshop at the Department of Biostatistics, Columbia University in October of 2015. This was attended by 15 academics, and was useful to being my 6 month collaborative visit at Columbia. This talk led to my introduction to (among others) Dr. Jeff Goldsmith, who I have since began collaborating with on developing methods in functional principal components analysis. |
Year(s) Of Engagement Activity | 2015 |
Description | Invited seminar, Harvard University |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I was invited br Dr. Erin Dunn to present my ongoing fellowship work to her group at Havard Public Health. This talk was attended by 20 academics and was related to my application rather than methodological aims. Excellent discussion has led to a new application of the methods, namely in brain imaging data collected over time. |
Year(s) Of Engagement Activity | 2015 |
Description | Invited seminar, London School of Hygiene and Tropical Medicine |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | I was invited by Dr. Ruth Keogh to give the weekly statistics seminar to the LSHTM group in February 2015. The talk was attended by roughly 40 academics and was made available on the LSHTM website. |
Year(s) Of Engagement Activity | 2015 |
Description | Invited seminar, National University of Ireland, Galway |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | I was invited to present my fellowship work to the Statistics group seminar at NUI, Galway in April 2015. There were approximately 20 academics in attendance and discussion afterwards was helpful in planning future research within the fellowship. |
Year(s) Of Engagement Activity | 2015 |