INDICATE: AI-enabled data curation, quality and fact-checking for medical documents
Lead Research Organisation:
Imperial College London
Department Name: Surgery and Cancer
Abstract
Clinicians, patients and policy makers lack access to accurate, real time information on new treatments for treating cancer. This is because such a large amount of information is continuously generated, and it is too complicated to be manually analysed in a timely fashion. This is sometimes referred to as a health 'infodemic'. Information analysed to create clinical evidence (known as systematic reviews) quickly goes out of date, and national bodies responsible for appraising new treatments such as the National Institute for Clinical Excellence are unable to keep up. It is increasingly hard to detect misinformation published within medical literature, and an increasing number of papers have to be withdrawn after publication. INDICATE is a deep learning tool for the autonomous generation of systematic reports and analysis of both structured and unstructured data from published literature on cancer. It has been developed through a collaboration between Imperial College London and Amazon Web Services, NICE and the British Medical Journal (BMJ). The aim is to develop a methodology for the real time analysis of healthcare infodemics that can be used to autonomously create clinical guidance and identify misinformation. This project will build on previous work to develop AI methodologies that automate how we search for medical literature and it will intelligently support peer reviewers as they appraise and assess the quality of research papers. This work has three main goals: 1. To develop a tool for detecting research fraud. 2. To asses if our AI tools can speed up the creation of NICE guidance. 3. To develop autonomous summary reports of clinical evidence of breast cancer treatment that could be used by medical publishers. The study group will work with clinicians, researchers and NICE to define and prioritise critical questions that require answering and to refine the user interface for the system. Moreover, we will prospectively validate the performance of the system to determine the accuracy and performance of its reporting mechanism. The validated data generated by this study will form the basis of a phase II study that scales the number of cancer types and the trial of the technology in a real world clinical environment.
Organisations
Publications
Freedman G
(2024)
Detecting Scientific Fraud Using Argument Mining
Kinross J
(2024)
The creation of an AI taskforce for colorectal surgery in the United Kingdom and Ireland.
in Colorectal disease : the official journal of the Association of Coloproctology of Great Britain and Ireland
Marcus HJ
(2024)
The IDEAL framework for surgical robotics: development, comparative evaluation and long-term monitoring.
in Nature medicine
Stackhouse A
(2023)
Knowledge Attainment and Engagement Among Medical Students: A Comparison of Three Forms of Online Learning
in Advances in Medical Education and Practice
Yiu A
(2024)
Adoption of routine surgical video recording: a nationwide freedom of information act request across England and Wales.
in EClinicalMedicine
| Description | This project has developed and validated a minimum viable AI technology that uses a combination of artificial intelligence and clinicians to accurately "peer review" medical text. We have specifically designed and build algorithms that can detect research fraud and improve the quality of data that is used by large language models to provide summarisation of medical information to clinicians and patients. To do this we build a corpus of over 200 million medical research papers and documents which were indexed. We have tested this with our collaborators at NICE and demonstrated that we can reduce their systematic search process from 6 weeks to a few minutes. We have also worked with our partners at the BMJ to demonstrated that the summarisation of medical knowledge with this technology can be used to create relevant, up today medical information that can be used by doctors. |
| Exploitation Route | We are now spinning out this technology into a company. The aim is to scale this into a product that all patients and clinicians can use to help improve the quality of health choices they make. We have launched the company in 2025, and we hope to be in hospitals this year. |
| Sectors | Healthcare |
| URL | https://www.theevidencecompany.com |
| Description | We have spun out a commercial vehicle for this technology (www.theevidencecompany) through which we aim to have social and commercial impact. As part of this work, one of the Co-Is (Dr. Uddhav Vaghela) has left medicine to work full time on the project. We are already deploying this technology into hospitals through this venture. We have had significant social engagement. We have presented and discussed our work on Radio (4 https://www.bbc.co.uk/programmes/m0026vt4) and through podcasts as part of our public engagement strategy (Royal institute: https://www.youtube.com/watch?v=pLQVzwI5VYs) 4 surgical trainees have worked with us on the project and we have developed open data sets and software for the scientific community. Through the presentation of our work at international conferences, the PI (Kinross) has been asked to chair a national surgical AI Taskforce. |
| First Year Of Impact | 2024 |
| Sector | Healthcare |
| Impact Types | Cultural Societal Economic Policy & public services |
| Description | ACPGBI AI Taskforce chair |
| Geographic Reach | Europe |
| Policy Influence Type | Participation in a guidance/advisory committee |
| Impact | Lead to strategic shift in organisation as to who clinicians will work with genAI tools in education, research and direct clinical care. |
| URL | https://pubmed.ncbi.nlm.nih.gov/39558582/ |
| Description | ACPGBI robotic subcommitee |
| Geographic Reach | National |
| Policy Influence Type | Participation in a guidance/advisory committee |
| URL | https://www.acpgbi.org.uk/about/committees/17/robotic_subcommittee/public |
| Description | AI round table member at the 2024 WHO summit |
| Geographic Reach | Europe |
| Policy Influence Type | Participation in a guidance/advisory committee |
| Impact | Focused on the prevention of misinformation in healthcare messaging and public health |
| URL | https://www.conference.worldhealthsummit.org |
| Title | REDASA COVID-19 Open Data |
| Description | he REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in our paper. The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the CORD-19 dataset, plus other sources deemed relevant by the REDASA consortium. The second S3 bucket contains a series of documents surfaced by Amazon Kendra that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations created by our curator community. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2021 |
| Provided To Others? | Yes |
| Impact | Research collaboration |
| URL | https://registry.opendata.aws/redasa-covid-data/ |
| Description | AWS |
| Organisation | Amazon.com |
| Department | Amazon Web Services |
| Country | United States |
| Sector | Private |
| PI Contribution | Our team has provided the academic and research strategy, and funding. |
| Collaborator Contribution | AWS provides cloud services and bioinformatic expertise to the R-CANCER funded project. |
| Impact | Publications: https://www.jmir.org/2021/5/e25714 Open data sets: https://registry.opendata.aws/redasa-covid-data/ |
| Start Year | 2021 |
| Description | BMJ |
| Organisation | BMJ Learning |
| Country | United Kingdom |
| Sector | Private |
| PI Contribution | We have developed an application the BMJ best evidence team, so the R-CANCER framework can be leveraged to determine best practice guidelines. |
| Collaborator Contribution | We have worked with them to develop these tools as part of an EPSRC Health AI application for collaborative funding. |
| Impact | An EPSRC health Ai application for funding. |
| Start Year | 2022 |
| Description | National institute for clinical excellence |
| Organisation | National Institute for Health and Care Excellence (NICE) |
| Department | NICE International |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | We have worked with NICE to adapt the methodology developed in this grant so that it is be used as part of their assessments for future and emerging healthcare technologiies in cancer care. |
| Collaborator Contribution | We have developed an EPSRC health AI application award that will further develop this application. |
| Impact | EPSRC health AI award application |
| Start Year | 2023 |
| Title | CoAI |
| Description | coai is a project that aims to provide a solution for annotating medical data. It allows users to annotate medical images, documents, and other types of data with relevant information for research, diagnosis, and treatment purposes. |
| Type Of Technology | New/Improved Technique/Technology |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | Open source software for all academics |
| URL | https://github.com/PanSurg/coai-platform |
| Title | Curation platform |
| Description | As a response to the infodemic of cancer related publications, we're developing the world's first real-time systematic review tool (R-CANCER) by combining cutting edge machine learning and a global collective of dedicated curators. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2021 |
| Open Source License? | Yes |
| Impact | On going |
| URL | https://curadr.com/ |
| Title | Pub-Guard-LLM |
| Description | Pub-Guard-LLM is a specifically designed LLM for detecting fraudulent papers in academic publications. Pub-Guard-LLM consistently surpasses the performance of various baselines and provides more reliable explanations. |
| Type Of Technology | New/Improved Technique/Technology |
| Year Produced | 2025 |
| Open Source License? | Yes |
| Impact | Open source software for the detection of research fraud |
| URL | https://arxiv.org/abs/2502.15429v2 |
| Company Name | The Evidence Company Ltd |
| Description | |
| Year Established | 2024 |
| Impact | The company was formally spun out from Imperial in 2025 and it is now securing its initial funding and seed round. |
| Description | Chincese Pacific Rim Surgical Technology Conference - AI and surgical foundational data sets |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | An invited talk on the future of AI and surgery |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://lps.eqxiul.com/ls/4njqrXE3?bt=yxy&eip=true&share_level=4&from_user=20230911ed073c4d&from_id=... |
| Description | Royal Institute AI podcase |
| Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Public/other audiences |
| Results and Impact | Videocast / podcast - extended interview on AI and INDICATE. For publication 2024. https://podcasters.spotify.com/pod/show/ri-science-podcast |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://podcasters.spotify.com/pod/show/ri-science-podcast |