Bio-renewable Formulation Information and Knowledge Management System

Lead Research Organisation: University of Sheffield
Department Name: Information School

Abstract

The project will build a demonstration information and knowledge management system (IKMS) to facilitate innovation with
new and replacement chemical materials from renewable biomass in formulated products. The IKMS will enable functional
ingredients from simple transformations of feedstocks to be identified more quickly and recommend the best feedstocks for
a particular function. If successful, it will repair a disconnection in the supply chain for exploitation of bio-based and
renewable materials as functional ingredients in formulated products, creating significant business benefit to the commercial partners and, following dissemination and further development, to the UK bio-based materials sector and
formulated products businesses as a whole. The demonstrator will focus on a search for bio-surfactant innovations, and will
be innovative in itself by both integrating several IT tools for the first time in a radical approach to formulated product design
and by being the first of its kind to be applied across a chemical using industry supply chain.
The ambition of the system is that it will collate and manage existing data with new data recovered from the experimental
measurements and use this to update the models applied by the search tools. An automated data-driven modelling tool will
be developed and integrated into the system for this purpose. As data is added and as models are improved, the
performance of the selection algorithms will improve along with the chances that the selected ingredient and formulation
candidates will meet downstream commercialisation criteria. It is important to note that modelling methods used here are
quite different but complementary to those to be developed under the TSB funded ICT project 101508, which are physicsbased
rather than data-driven, and will provide powerful capability for fast selection of novel chemistries against a subset of
filter criteria and provide mechanistic insights to sharpen these filters for better precision and better experimental assay
design.
To achieve its objectives, the project will extend the 101508 information model and add a repository to store formulation
information (composition and assembly) and property data (experimental and computed) to complement the feedstock and
transformation repositories. The information model and repository will need to be chemically intelligent, use readily
extensible RDF and triple store technologies, and incorporate semantic search capabilities to facilitate integration.
Modelling tools will be adapted and implemented using modern machine learning methods to find the mathematical
relationships between ingredient structure and properties, and between formulation composition and assembly with
application performance. The models will be built on data created during the project and added to the 101508 model
repository. The 101508 tools for enumerating ingredient options (from feedstocks and chemical transformation processes)
will be extended to enumerating formulations (from ingredients and assembly processes). The enumeration tools will be
coupled to a global many-objective search tool using diversity or chemical structure/formulation composition/assembly -
property models for efficient exploration of the combinatorial ingredient/formulation space.
We will also develop tools to help maintain and grow the IKMS with minimal overhead to future projects. These include
semantic search and semi-automated extraction of appropriate data from literature and other available resources, and for
ontological integration and semi-autonomous building of ontologies where these do not exist.
In order to demonstrate how this system will work in practice, novel bio-surfactants identified in 101508 will be made and
their properties measured, a selected sub-set formulated and evaluated and the data and derived models used to drive
another cycle of bio-surfactant selection and formulation optimisation.

Planned Impact

The main impacts of this work are the environment and manufacturing business. Underlying the proposal is the aim of
reducing reliance on oil based ingredients and greater exploitation of waste materials from plants. Meeting such goals will
help address the green agenda of both the manufacturing partners and the UK government. The manufacturing sector
benefit by reduced costs, reduced reliance on oil for their ingredients, as well as the access to a greater variety of
functional formulated products apparent in plant based feedstocks. The producers of such feedstocks and formulated
products benefit from a greater market potential for their products. Inside the consortium the minimal ECONOMIC benefit to
Unilever is the estimated low scenario NPV of £13.3m from a 2016 launch to 2020 for a bio-surfactant exploited in a
laundry liquid detergent. It is anticipated that these benefits will be much greater from the globalisation of bio-surfactant
containing products and other functional ingredients in other product types. Croda and British Sugar should also benefit
where they participate in these supply chains over all these scenarios and other supply chains for other chemical using
sectors than home and personal care. Unilever publically declared an ambition to reduce GHG by 50% in 2020. Innovation
is key to achieving this and the IKMS will support this environmental benefit explicitly by consideration of sustainable
materials & energy consumption in sourcing and processing, and utilisation of waste (and therefore avoiding its disposal).
Outside the consortium it is anticipated that similar and/or greater economic benefit will flow to other in chemical using
industries from the use of the IKMS to select functional ingredients for other applications. It is also anticipated that the
academic advances in information management and analysis and many criteria search and optimisation could be applied to
many ECONOMIC (e.g. more effective & efficient innovation inengineering), SOCIAL (e.g. diagnosis and treatment in
health care) and environmental (e.g. more effective and efficient innovation in water and energy management) domains.
Outside the consortium environmental and social gains will be derived through reduced road transport (congestion, noise
and emissions (see TSB Project TP/ZEE/7/N0036A) because reduced amounts of bio-surfactant compared with existing
materials will be used, resulting also in decreased impact on water treatment facilities. Reduced energy and solvent use
will also benefit the environment and reduce energy costs to the industrial partners within the consortium, while also
decreasing dependence on non-renewable petroleum-derived materials. Cybula could expect economic benefit from
commercial use of the IKMS and increased commercial use of its YouShare platform. The establishment of this IKMS in the
local research ecosystem provides SOCIAL benefit in further demonstrating Unilever, Croda and AB Sugar's commitment
to supporting R&D activity in the UK with its concomitant benefits to society and the economy. SOCIAL benefits will come
from improved consumer products, with additional functionality that these materials offer to the consumer like mildness.

Publications

10 25 50
 
Description This EPSRC award funds Sheffield's contribution to two larger projects funding through INNOVATE UK (N8 BioHub IKMS TSB Ref No: 101508; and N8 Bio-Derived FIKMS TSB 101717) and which involve additional partners including Unilever, Croda, British Sugar, Cybula, the University of Manchester and the University of Liverpool.

The original goal of the projects was to create and demonstrate an Information and Knowledge Management System (IKMS) that provided a solution to two significant barriers to innovation with sustainable ingredients in formulated products. (1) Ingredient innovation itself is time consuming and costly because potentially huge numbers of options must be evaluated against many design criteria. Traditionally this leads to inefficient serial evaluation processes and conservative sampling of the option space. Opportunities are missed and market share and margin suffer because of impact on time to market. (2) A very useful route to sustainable ingredients involves application of green chemical transformations to co-products of existing sustainable feedstocks. The problem with these opportunities is that the knowledge of a complex feedstock/transformation/end-use application supply chain has to be coalesced to realise them. Success for this project is the demonstration of the application of the IKMS for design of sustainable bio-derived surfactants and a business model for the further development and exploitation of the IKMS itself.

To achieve the goal, the project has integrated some modern semantic knowledge management components (ontologies and RDF triple store repositories) with advanced model-based Many-Objective Search Tools (MOST) using the YouShare software as a service integration platform. Users access the system through an interface dynamically populated with available feedstocks, transformations and application criteria. They can then run a process that evolves lists of candidates from their selections of feedstocks and transformations that are Pareto-optimal in terms of their application criteria. A key insight from the project is that the most appropriate representation of an ingredient design is its whole genesis tree (several ingredient constituents being assembled by a number of transformations in a particular sequence). The very large number of possible trees is searched using efficient genetic programming methods that manipulate only a tractably small sample of that large space. Both the characteristics of the trees (e.g. cost, complexity) and the properties of the product node (e.g. predicted interfacial properties using a physical chemical model) are evaluated in parallel to select the trees that are propagated by genetic operations over successive selection iterations, until a set of trees that can't be bettered is obtained. As far as we know, the application of these methods to this application domain are completely novel. The uncertainty and risk involved would have meant that this work would never have been done without the benefit of leveraged funding and cooperative effort.

The IKMS components have been developed and supplied by Cybula and the Universities of Sheffield and Manchester, the content (feedstocks / transformations / end-use application) has been supplied by Unilever, Croda and British Sugar.
The demonstration IKMS has been delivered in full and the interface is operating.
Exploitation Route An issue with many knowledge management systems is they "die" once their development project funding ends. An effort has been made to reduce the overhead on development and ownership for a persistent business model. We have ensured and demonstrated it is easy to extend the content to other feedstocks, transformations and end use applications, and plug and play the various components through a web service approach. As a consequence of this, and the realisation that the tree representation of the design space is equally applicable to formulation, the IKMS has passed the first gate to adoption and development by the National Formulation Centre (Centre for Process Innovation and High Value Manufacturing Catapult).

This is the first time that that academic partners in the project have been involved in porting their skills to the much neglected informatics of complex formulated materials. The proposed follow-on project through the National Formulation Centre initiative would open up new research areas in this industry sector..
Sectors Chemicals,Manufacturing, including Industrial Biotechology

 
Description As the critical demonstration of impact, the project has identified a number of bio-derived surfactant leads of sufficient practical interest to be further developed. One set are in a quite advanced stage of evaluation by the project partners, Unilever. The advantage gained in using this system is not only that new opportunities can be identified efficiently, but that they are pre-screened for likelihood that they will pass experimental criteria evaluations in the downstream innovation processes, thereby improving both the quality and speed of innovation.
First Year Of Impact 2015
Sector Chemicals,Manufacturing, including Industrial Biotechology
Impact Types Economic

 
Description BBSRC National Productivity Investment Fund (NPIF)
Amount £95,000 (GBP)
Funding ID BB/R505821/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2017 
End 09/2021
 
Description Studentship
Amount £30,000 (GBP)
Organisation Evotec 
Sector Private
Country Germany
Start 01/2017 
End 01/2020
 
Description N8 BioHub 
Organisation Associated British Foods PLC
Department British Sugar
Country United Kingdom 
Sector Private 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Description N8 BioHub 
Organisation Croda International
Country United Kingdom 
Sector Private 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Description N8 BioHub 
Organisation Cybula
Country United Kingdom 
Sector Private 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Description N8 BioHub 
Organisation Unilever
Department Unilever Research and Development
Country United Kingdom 
Sector Private 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Description N8 BioHub 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Description N8 BioHub 
Organisation University of Manchester
Country United Kingdom 
Sector Academic/University 
PI Contribution The Sheffield contribution has been to develop the multi- and many-objective search software tools that takes a set of feedstocks as input and evolves candidate product molecules through the iterative application of transformations to those feedstocks. This process generates a very large number of potential products and a Pareto-optimal subset is sleets according to an input set of design criteria.
Collaborator Contribution The University of Manchester has developed semantic knowledge components to enable ingredients and transformations to be extracted from the literature and stored within a repository. The Cybula team has integrated the different software components into a service integration teams. The University of Liverpool contribution is to synthesis candidate molecules. The industrial partners have provided extensive knowledge about all areas of the problem domain; access to ingredients; modelling software to enable appropriate criteria to be modelled; and the resources to synthesis and test the resulting candidate compounds.
Impact The outputs to date have been listed elsewhere in the form.
Start Year 2014
 
Title De novo design using genetic programming 
Description A novel approach for the de novo design of chemical compounds has been developed based on genetic programming. The method has advantages over alternative algorithms in that it naturally allows building block components to be exchanged, thereby, allowing a greater exploration of chemical space. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2016 
Impact It has been used by project partner, Unilever, in the design of novel surfactants.