Automated Chemical Structure Extraction

Lead Research Organisation: University of Nottingham
Department Name: Sch of Chemistry

Abstract

Automating chemical structure extraction from literature repositories such as, pdf documents, journals, images, into machine a machine-readable format.

This is designed to extrapolate a wealth of data from an otherwise manual resource that can be used in large quantities to tackle ongoing unoptimized processes within the chemical industry. It also presents the potential for an expansive solution to the currently unsolved reliability problem of converting molecular structures to different formats using Image Recognition. By automating such an untapped resource; this opens opportunities for more data driven optimisation to help in the need for optimising chemical reaction/mechanism steps.

Proposed solution and methodology

Use image recognition to autonomously segment images from journal articles and other resources to collect a wealth of machine-readable instances of molecules. The current proposed methodology is to use an end to end deep learning pipeline to define the stages of segmentation to conversion.

Planned Impact

This CDT will deliver impact aligned to the following agendas:

People
A2P will provide over 60 PhD graduates with the skill sets required to deliver innovative sustainable products and processes into the UK chemicals manufacturing industry. A2P will inspire and develop leaders who will:
- understand the needs of industrial end-users;
- embed sustainability across a range of sectors; and
- catalyse the transition to a more productive and resilient UK economy.

Economy
A2P will promote a step change towards a circular economy that embraces resilience and efficiency in terms of atoms and energy. The benefits of adopting more sustainable design principles and smarter production are clear. For example, the global production of active pharmaceutical ingredients (APIs) has been estimated at 65,000-100,000 tonnes per annum. The scale of associated waste is > 10 million tonnes per annum with a disposal cost of more than £15 billion. Consequently, even a modest efficiency increase by applying new, more sustainable chemical processes would deliver substantial economic savings and environmental wins. A2P will seek and deliver systematic gains across all sectors of the chemicals manufacturing industry. Our goals of providing cross-scale training in chemical sciences with economic and life- cycle awareness will drive uptake of sustainable best practice in UK industry, leading to improved economic competitiveness.

Knowledge
This CDT will deliver significant new knowledge in the development of more sustainable processes and products. It will integrate the philosophy of sustainability with catalysis, synthetic methodology, process engineering, and scale-up. Critical concepts such as energy/resource efficiency, life cycle analysis, recycling, and sustainability metrics will become seamlessly joined to what is considered a 'normal' approach to new molecular products. This knowledge and experience will be shared through publications, conferences and other engagement activities. A2P partners will provide efficient routes to market ensuring the efficient translation and transferal of new technologies is realised, ensuring impact is achieved.

Society
The chemistry-using industries manufacture a rich portfolio of products that are critical in maintaining a high quality of life in the UK. A2P will provide highly trained people and new knowledge to develop smarter, better products, whilst increasing the efficiency and sustainability of chemicals manufacture.
To amplify the impacts of our CDT, effective public engagement and technology transfer will become crucially important. As a general comment, 'sustainability' styled research is often regarded in a positive light by society, however, the science that underpins its effective implementation is often poorly appreciated. The University of Nottingham has developed an effective communication portfolio (with dedicated outreach staff) to tackle this issue. In addition to more traditional routes of scientific communication and dissemination, A2P will develop a portfolio of engagement and outreach activities including blogs, webpages, public outreach events, and contribution of material to our award-winning YouTube channel, www.periodicvideos.com.

A2P will build on our successful Sustainable Chemicals and Processes Industry Forum (SCIF), which will provide entry to networks with a wide range of chemical science end-users (spanning multinationals through to speciality SMEs), policy makers and regulators. We will share new scientific developments and best practice with leaders in these areas, to help realise the full impact of our CDT. Annual showcase events will provide a forum where knowledge may be disseminated to partners, we will broaden these events to include participants from thematically linked CDTs from across the UK, we will build on our track record of delivering hi-impact inter-CDT events with complementary centres hosted by the Universities of Bath and Bristol.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S022236/1 01/10/2019 31/03/2028
2285004 Studentship EP/S022236/1 01/10/2019 31/08/2021 Tevyn Allen