Array Design for Lead Optimisation in Pharmaceutical Research

Lead Research Organisation: University of Sheffield

Department Name: Information Studies

Abstract

Lead optimisation is the name given to that part of a drug discovery project in which molecules are synthesised and tested to ensure that they not only have some required pharmacological activity, e.g., reducing blood pressure or shrinking a tumour, but also exhibit a range of additional desirable properties, e.g., being readily soluble, easy to synthesise and not being toxic. Medicinal chemists seeking to discover new drugs have traditionally synthesised potential drug molecules one at a time in an iterative design-synthesise-test cycle. However, the recent introduction of combinatorial chemistry technologies enables them to synthesis arrays containing hundreds or even thousands of compounds simultaneously. While, this is a very efficient way of synthesising new molecules and thus provides an effective way of exploring the range of possible compounds, the chemist is still faced with difficult design decisions especially when seeking to find the optimum combination of several properties. This project will develop computer tools that will assist a medicinal chemist in designing new arrays of compounds, with the aim of expediting the discovery of new drugs. The project will draw on a large archive of arrays that have been generated in the past in lead optimisation programmes at GlaxoSmithKline. A key phase will involve analysing the archive to determine the ways in which arrays have been used in the past to explore relationships between the structures of molecules and their properties, and in particular the ways in which these properties have been improved over the course of a successful optimisation. This knowledge extraction phase will then provide the input to the second phase of the project, which will involve the construction of a decision support system, a computer system that will provide the chemist with guidelines on what array or arrays to make next at each stage of an optimisation project. The development of a chemist-friendly system for array design should be of enormous benefit to experimental chemists, not just in the pharmaceutical and agrochemical sectors, but also in other industrial sectors where array methods are starting to be used, such as materials science and catalyst design.

Funded Value:

£235,681

Funded Period:

Oct 06 - Jan 11

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/E020410/1

Principal Investigator:

Valerie Gillet

Research Subject:

Chemical synthesis (100%)

Research Topic:

Biological & Medicinal Chem. (25%)

Combinatorial Chemistry (75%)

Organisations

People	ORCID iD
Valerie Gillet (Principal Investigator)
Peter Willett (Co-Investigator)
Visakan Kadirkamanathan (Co-Investigator)

Publications

Author Name Title Publication

Date Published

10 25 50

Papadatos G (2009) Analysis of neighborhood behavior in lead optimization and array design. in Journal of chemical information and modeling

Papadatos G (2010) Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of HERG inhibition, solubility, and lipophilicity. in Journal of chemical information and modeling

Anthony Cooper (2008) Neighbourhood Behaviour Studies for Lead Optimisation

George Papadatos (2010) Enhancing Matched Molecular Pair Analysis during Lead Optimisation

Papadatos G (2013) Chemoinformatics for Drug Discovery

Papadatos G (2012) Bioisosteres in Medicinal Chemistry

Muhammad Alkarouri (2010) Application of Novel Data Mining Techniques to Improve Decisions during Lead Optimisation

Key Findings
Impact Summary


Description	The project has focused on the development of tools to assist medicinal chemists in the design of compound arrays during the lead optimisation stage of drug discovery. Lead optimisation is a complex, time-consuming task, in which chemists seek to obtain a promising balance among potency, off-target interactions, toxicity, and pharmacokinetic behaviour, to identify a candidate molecule to progress to clinical trials. The focus has been on inverse QSAR, that is, determining the structural change, ?S, necessary to achieve a desired change in property, ?P. This has been approached through retrospective studies of lead optimisation projects within the GSK archive and the development of computational tools that can be applied in prospective array design to inform decision making by chemists. The retrospective analysis was achieved through development of data mining tools that attempt to reconstruct the temporal progress of historical projects by characterising compounds by array and by progress towards the project design objectives. For example, ?S-?P plots can be generated which provide insights on the extent of chemical space exploration and the existence of activity cliffs within an array. However, this analysis demonstrated that complete resolution of a project into arrays is not always possible due to incomplete data capture during project development. Thus, further analyses were carried out at the project level. Novel methods developed for visualising and assessing project progress temporally proved very informative in analysing the success of a project. For example, multi-objective scoring methods were used to rank compounds based on acceptance criteria and showed that while lead optimisation did indeed result in better overall compounds, the actual proposed drug candidate was not the most optimal solution. The relative merits of two computational methods for quantifying the neighbourhood behaviour of descriptors were compared for lead optimisation. The analysis was based on chemical array data spanning multiple chemotypes and tested against multiple properties including potency, metabolic stability, permeability and lipophilicity, and considered a wide range of descriptors. The optimality criterion method was found to be an effective way of selecting descriptors for the systematic exploration of chemical space during array-based lead optimisation. Moreover, circular-fingerprints were found to be most effective at exploring ?S-?P relationships. The final part of the project focused on explicit structural transformations between pairs of molecules and their impact on different ADMET end points, in what has become known as molecular matched pairs (MMPs) analysis. Previous MMPs analyses assume that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). However, experiments with large sets of hERG, solubility, and lipophilicity data demonstrated that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches. The context sensitive observations identified within the GSK data were confirmed on a publicly available hERG dataset, indicating that the results are transferable and can therefore be used in prospective design. Tools developed during the project have been incorporated into a prototype decision support system called DejaChem for use by project chemists. Given a seed compound for an array and a property to optimised, the system will return structural transformations in similar contexts to the seed that have resulted in the desired change in property, thus allowing historical medicinal chemistry knowledge to be applied to newly presented situations.
Exploitation Route	This work has focussed on the development of tools to assist medicinal chemists in the design of compound arrays during the lead optimisation phase of drug discovery. The findings have had an impact on work practises at GSK and have the potential to benefit other organisations who design compound arrays. For example, our work on including context within Matched Molecular Pair analyses is of significant benefit for the interpretation of results, providing evidence based visualisation that correlates with the knowledge of experienced medicinal chemists. Our paper describing this work has been widely cited (47 citations in Web of Science - 1/3/2016).
Sectors	Chemicals,Pharmaceuticals and Medical Biotechnology


Description	The project has focused on the development of tools to assist medicinal chemists at GSK in the design of compound arrays during the lead optimisation stage of drug discovery. A key aspect in the success of this project was the incorporation of medicinal chemist (customer) knowledge and perspective into data evaluation and tool design. The initial phase of the project laid the groundwork for understanding the relationship between chemical descriptors, similarity and the goal of the array. Data mining and the retrospective analysis of historical projects highlighted the incomplete nature of data generation in discovery programmes. The data visualisation techniques demonstrated a wide heterogeneity of array design goals and relevant measures for success. The interdisciplinary nature of this project highlighted the need to proactively document array goals to maximise the benefits for data mining and subsequent array design. The evolution of the project to consider the data from matched molecular pairs provided a methodology for prospective array design. The addition of local contextual information adds significant benefit for the interpretation of results, providing evidence based visualisation that correlates with the knowledge of experienced medicinal chemists. The implementation of this methodology should provide considerable benefits for array design.
First Year Of Impact	2009
Sector	Chemicals,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic

Abstract

Organisations

People

ORCID iD

Publications