MICA: Mental Health Data Pathfinder: University of Cambridge, Cambridgeshire & Peterborough NHS Foundation Trust, and Microsoft

Lead Research Organisation: University of Cambridge


Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

With strong NHS partnerships and recent contributions to national mental health (MH) informatics, we shall add novel methods, epidemiology and phenotyping to the MH Platform. We envisage a modular pipeline that de-identifies MH data; supports flexible consent for
sharing/contact; and links MH, cognitive, physical, psychosocial and biomarker data.
Project (P) 1. Our open-source tools de-identify clinical records to create CPFT’s Research Database, supporting research and participation. We shall extend them to generate anonymised subsets and link data from consenting patients across MH/community services,
acute care, and research organizations, including from existing deeply phenotyped longitudinal cohorts. We emphasize rigorous interface standards and NHS governance over identifiable/pseudonymised data. We shall collaborate on a national natural language
processing framework, allowing NHS/research organizations to generate structured data from
free text.
P2. We have created novel open-source neuropsychiatric assessment software. We shall
extend it for broad and integrated NHS and research use. This will take automated cognitive testing into routine clinical practice. As a bold but tractable exemplar with research and clinical applications, we shall use it to apply electronic diagnostic algorithms and neuropsychiatric phenotyping, and link these detailed data to clinical records and biomarkers that include
P3. We shall apply P1 tools to a public health crisis: the premature death of those with serious mental illness. We shall link MH, national and acute Trust data and use machine learning to develop early predictors of mortality.
P4. We shall democratize MH research though broad consultation on generic tiered consent models for data-sharing and participation, by giving the research database direct clinical interfaces, and by enhancing data visualization to help clinicians and service users develop research and the NHS improve local services.


10 25 50

publication icon
Fernandez-Egea E (2020) Birth weight, family history of diabetes and diabetes onset in schizophrenia. in BMJ open diabetes research & care

Title The CPFT Research Database 
Description This is a database for research and research recruitment created by de-identifying Cambridgeshire & Peterborough NHS FT (CPFT) clinical records. The current grant is enabling this to be developed further -- so far with enhanced natural language processing (NLP) tools, with other developments to follow. 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? No  
Impact Previous publications arising from this database. More to follow based on the extensions from this grant (but none of those yet). 
URL http://www.cpft.nhs.uk/research
Description Cambridge / King's College London collaboration for MRC Mental Health Data pathfinder awards 
Organisation King's College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We have designed an application programming interface (API) for computerized natural language processing (NLP), suitable for a national NHS NLP platform. We have refined this with KCL and are now implementing it for use on KCL (+ South London & Maudsley NHS Foundation Trust) servers in the Microsoft Azure cloud. We have written NLP tools (e.g. for finding inflammatory markers within free text) that we have made open-source and will contribute to this platform.
Collaborator Contribution KCL provide many other NLP tools relevant to psychiatry and server infrastructure.
Impact Developments to our software at https://crateanon.readthedocs.io/ .
Start Year 2018
Description Microsoft Research 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution This collaboration is being negotiated (with contracts team). We aim to bring our expertise in psychiatry and analysis of de-identified clinical data, plus machine learning expertise from the University, to a collaboration with Microsoft Research UK to improve the prediction of adverse outcomes (such as premature death) in schizophrenia.
Collaborator Contribution This collaboration is being negotiated (with contracts team). Microsoft Research UK plan to provide sophisticated machine learning algorithms, suitable for being trained within a secure NHS environment on de-identified NHS clinical data, to predict outcomes in serious mental illness. The aim is that the trained algorithms can then be deployed elsewhere, without any direct transfer of data derived from NHS clinical records.
Impact This collaboration is being negotiated (with contracts team).
Start Year 2018
Title CRATE: Clinical Records Anonymisation and Text Extraction 
Description The CRATE package (1) de-identifies clinical records; (2) manages a natural language processing (NLP) pipeline, provides its own NLP tools to extract structured information from free text, and manages third-party NLP tools; (3) provides a research interface to arbitrary databases derived from clinical records; (4) provides a computerized consent-to-contact process operational within the NHS. In 2018 it has been extended to support an NLP API that we have designed in support of a national NHS NLP service (collaboration with KCL; q.v.). 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Implementation of a research database across an NHS Trust (Cambridgeshire & Peterborough NHS FT; CPFT). Use elsewhere (e.g. Canada). Patients recruited to studies via its consent-to-contact methods in CPFT. 
URL https://crateanon.readthedocs.io/
Title CamCOPS: the Cambridge Cognitive and Psychiatric Assessment Kit 
Description CamCOPS is an open-source application for capturing information relevant for cognitive and psychiatric assessment, on tablets, laptops, and desktops. It offers simple questionnaires and more complex tasks, and sends its data securely to your server. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact In use for preclinical and clinical research locally. About to launch in CPFT Perinatal Psychiatry service (due 2019). 
URL https://camcops.readthedocs.io/
Description "Brainworks" public event, 1 Nov 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Presentation on research databases (and their use for recruitment to research studies) derived from de-identified NHS mental health records at Cambridge Biomedical Research Centre event, "Brainworks".
Year(s) Of Engagement Activity 2018
Description Patient engagement including development of a Research Advisory Group for NHS mental health data research in Cambridgeshire & Peterborough NHS FT 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact Ongoing work to create a local advisory group prior to national consultation.
Year(s) Of Engagement Activity 2019