Using Text Mining to Aid Cancer Risk Assessment

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

The amount of evidence regarding environmental and occupational contributions to cancer justifies urgent acceleration of policy efforts to prevent carcinogenic exposures. Without such precautionary actions the protection of public health and the environment is at risk. Adequate risk management is dependent on reliable risk assessment. Risk assessment involves evaluating existing scientific knowledge to establish the relationship between exposure to a substance and the likelihood of developing cancer from that exposure. The resulting assessments determine which substances are associated with high risks of developing cancer and which carry little or no risk. They provide the authorities with the key material needed for making decisions on setting exposure limits on environmental contaminants.
Performed manually, risk assessment is a challenging and time-consuming task which requires combining scientific expertise with elaborate literature search and review. Data required for risk assessment of a single carcinogen may be scattered across thousands of journal articles. With the exponentially growing body of literature, the rapid development of molecular biology techniques, and the increasing knowledge of mechanisms affecting cancer development, the task is getting too ambitious to manage via manual means.
We propose to develop a practical tool capable of assisting risk assessors in their work. This tool makes use of text mining technology from computer science. Text mining is aimed at automatically extracting and discovering novel information from written resources. It could be used to support various steps of risk assessment, including finding the journal articles relevant for risk assessment, locating the relevant information in the articles, and identifying and gathering the information related to the specific carcinogenic properties of chemicals. The basic techniques required for text mining are now available, but these techniques need to be tailored and extended for real-world tasks. In this project we will modify existing and develop novel text mining technology for the needs of cancer risk assessment. We will integrate the best of the resulting technology into a practical tool which risk assessors can use in their work. Our hope is that the tool can greatly assist risk assessors with the management of large textual data, increase their productivity, and aid knowledge discovery.

Technical Summary

The body of scientific evidence showing a strong link between environmental chemicals and cancer calls for urgent efforts to protect public health by issuing exposure limits on the use of harmful chemicals. The critical tool used by authorities in making decisions on the exposure limits is risk assessment. Performed by teams of experts, risk assessment is a demanding scientific exercise which involves examining existing published evidence to determine the relationship between exposure to a substance and the likelihood of developing cancer from that exposure. The task is costly because it is very time-consuming: it involves manually searching, locating and interpreting the relevant information in repositories of scientific peer reviewed journal articles. Given the exponentially growing volume of published literature, the rapid development of molecular biology techniques, and the increasing knowledge of mechanisms affecting cancer development, it is now getting too ambitious to manage via manual means.
We propose to investigate a novel, more effective approach to cancer risk assessment which is based on text mining. Text mining is a growing field of computer science which involves automatic discovery of new information from written resources. Recently, biomedical text mining has become increasingly popular due to the need to provide access to the tremendous body of data in biomedical sciences. Considerable progress has been made in the development of basic techniques in this area. If these techniques were extended and specifically tailored for the needs of cancer risk assessment, text mining could greatly assist risk assessors with the management of large textual data, increase their productivity, and aid knowledge discovery.
In this project, we will investigate developing existing and novel text mining technology for the needs of cancer risk assessment. The best of the resulting technology will be integrated in a practical tool which we will build for assisting risk assessors in their work. The usefulness of the technology will be evaluated directly and in the context of the practical tool.
The project capitalises on the complementary expertises available at two institutions in Europe. The University of Cambridge Computer Laboratory brings in the expertise of text mining from scientific literature. The Karolinska Institutet (Sweden) provides the knowledge of cancer research and risk assessment. The proposed research is innovative and strongly interdisciplinary. It will provide an important case study for the integration of text mining services into critical activities of biomedicine.

Publications

10 25 50