Artificial Intelligence Tools For Automatic Single Molecule Analysis

Lead Research Organisation: University of Liverpool
Department Name: Institute of Ageing and Chronic Disease

Abstract

*What will we do?*
Develop and distribute an "artificial intelligence" application to allow fellow scientists to analyze a type of data ("single molecule data") difficult to analyse with existing methods.

*Machines can learn, but they take a lot of training*
Machines can recognise speech in devices from Amazon's "Alexa" to call centres using artificial intelligence ("AI"). As an example, just ask Alexa what "AI" is, and she will tell you. This has only been possible recently as computers have become sufficiently powerful. The technologies to do it are collectively called "machine learning". One advantage of such "machines" is that they can answer questions unsupervised by people. One limitation is that they require enormous labelled datasets to "learn" in the first place. This is called "training data". We have devised, a "trick" to generate massive datasets, by playing simulated data into recording apparatus and then recording back the resulting signal. Because we control the entire process the data is inherently "labelled" in the way necessary to train intelligent machines. With this technique together with the use of Google Brain's freely available "TensorFlow" AI library, we can create applications that analyze data for us.

*Why "Single Molecules"*
Many molecules found in animal cells behave as switches. Their individual "on" or "off" state can then be measured as either pulses of light or electrical current giving real-time mechanistic insight. They are important throughout biology with the estimated Global market for drugs targeting one family of these molecular switches (ion channels) alone being $11.5bn. The flip-side is that experiments measuring these "switches" generate big datasets that are difficult and laborious to analyse.

*Could single molecule biology contribute to tackling diseases of age, climate change, anti-bacterial resistance and global terrorism?*
In short "Yes": Ion channel malfunction in particular, is responsible for many age-related diseases. Interest from Pharma is enormous because they are targets for many drugs from sedatives to heart medicines. Certain insecticides act via their ion channels, but these are toxic to people too. One such agent, the nerve toxin "VX" hit the news when it was used in the assassination of Kim Jong-Nam. Even the relatively safe insect repellent citronella repels mosquitoes by activating ion channels. Permethrin-resistant mosquitoes are resistant because they have a specific mutation in an ion channel creating a real problem in malaria control. Plants too express a range of ion channels with critical roles including salt regulation. Since climate change is increasing the salination of many agricultural regions, there is keen interest in whether biological modification of root ion channels could promote survival of crops in salt-rich soils. Ion channels have also been studied in synthetic biology because they can be activated by chemicals at concentrations far lower than that of other sensors. Many uses have been proposed for these, such as detection of explosives, biological weapons, narcotics or certain diseases. A recent discovery is that bacteria also "talk" to each other by an ion channel dependent biofilm communication network that is necessary for their survival. This has raised the possibility that ion channel blocking drugs could constitute a new generation of antibacterials that are less susceptible to resistance. So indeed single molecule biology could contribute to study of several grand challenges in society. In each case, a limiting factor is currently the time required to study the large datasets generated by these molecules.

*How can we help?*
We will create a simple to use AI-based analysis application to allow rapid analysis of these data. These will be especially useful to industrial partners who produce large data sets during drug development but have few tools available to analyze this fully.

Technical Summary

Despite recent advances in the analysis of 'omic data, there have been few equivalent advances in the analysis of the large time-series datasets created by functional single molecule experiments. We will address this by developing machine learning tools to allow a novel "Big Data" approach for the analysis of complex single-molecule FRET (smFRET) and patch-clamp data from any cell type. Whilst the technologies for recording such data improve year-by-year, the procedures for analysis have not changed significantly over the past 20 years. Such analysis is slow and laborious, requiring full expert supervision. In native cells, the data are difficult to interpret objectively, and it takes many times longer to analyse than to collect. Furthermore, recent new high throughput recording machines have been created, but there are no software solutions that fully cope with this volume of data. Big Data Analytics are the key to solve the problem. Deep learning, a recent machine learning development, has brought what has been referred to as the 'artificial intelligence (AI) spring'. Deep learning can innovatively solve complex and ambiguous problems and is ideal for solving the data analysis challenges posed by single-molecule research. In this project, we will produce two different families of deep learning kernels for single molecule analysis. The first will be developed using long-short-term memory architecture, a recurrent neural network with back-propagation through hidden layers (to retain sequence information) and the other is an adaptation of our convolutional neural network models which we currently use for image analysis. We train the network initially with semi-synthetic data acquired through regular acquisition hardware and finish training with small expert labelled training sets. We will produce, validate and distribute a GUI faced AI-based package to analyse single molecule data of either smFRET or patch-clamp origin, reading popular data formats directly.

Planned Impact

Single molecule research and AI are of great strategic relevance to industry. There have been numerous reports, including the Economist's article **"Artificial intelligence: Million-dollar babies"** discussing a series of huge recent investments and buy-ups in/of AI technologies and a brain drain to Silicon Valley. Furthermore, the Global Market Report (from Reuters) puts the ion channel modulator market alone at $11.5bn/year. This project will boost this strategic priority area by creating a UK bridge-head for Artificial Intelligence approaches to drug discovery.
Timelines for all the following activities are included in the Full Impact Statement attachment.

*Training*
The BBSRC's (politely termed) "Vulnerable Capabilities Report" recently highlighted that the UK has an alarming and increasing lack of skills in 5 areas amongst many of our young biologists. These included "(ii) Maths, statistics and computation" skills. This may follow the hard bits being removed from UG programs as courses try to increase student marks and feedback scores? The predominance of kit based science may then have also contributed to other vulnerable skill deficits identified in the report such as lack of "(iii) Physiology [and pathology]" and "(i) Interdisciplinary" skills. This project will, therefore, address the BBSRCs top THREE "vulnerable skills" by embedding either a young mathematician/computer scientist or biologist into a multidisciplinary group. A biologist employed to lead this project would receive specific training in (1) Matlab programming, (2) Python programming and (3) machine learning. A mathematician already with those skills would receive single molecule theoretical training and interact in group meetings with active single molecule researchers on a weekly basis.

*Industry*
The strategic relevance of AI approaches to biotechnology ("Data-Driven Biology") are clear and the focus of our industrial enterprise is software support for high-throughput (HTS) ion channel drug discovery technology. Hardware system outputs greatly now exceed the ability of existing software to cope. We have started working with Molecular Devices and Nanion to develop a suite appropriate for HTS. Our ultimate goal is for this software to be shipped with the hardware. We have Letters of Support from both Europe and the USA's largest manufacturers of HTS, from Pharma and from major UK software developers CED.

*Animal Welfare*
In the short term, this project will pilot technology without the need of the animal tissue that would normally be used in this type of study and by releasing models initially trained on synthetic data, we will reduce the need for others to use animals. Furthermore, the AI approach will recover molecular events more efficiently than the non-AI approach and so the numbers of animals needed for any of these studies is reduced.

*Dissemination*
We will disseminate through, but also, in order of perceived importance: (i) A workshop hosted at the Physiological Society "main" meeting (summer 2019). (ii) Link with the Physionet website (hosted by MIT-USA) to share our data and we will run a Kaggle.org competition for informaticians to extract idealized records for our most complex synthetic data. The cash prize this competition will offer will entice non-biologists and hobbyists to also become engaged. (iii) Our social media outlets (several thousand followers). Additionally, as with all projects, our early progress will be disseminated at US Biophysical/Experimental Biology and international AI conferences and by continued publication in high impact international journals.

*Outreach and Public Engagement*
RBJ set-up the first Meet-The-Scientists event in Liverpool a few years ago, but the success of this has led to a regular series of events at the Liverpool World Museum. This project team will pioneer the use of illustrative cartoon examples of machine learning specifically designed for kids.
 
Description We have developed the first ever deep learning network which detects physiological ion channel events in stretches of noisy data using techniques similar to human vision. This is exciting and has been accepted for publication, both in short and long forms (separately).
Exploitation Route This project is still on-going and I will seek follow-on funding to make a commercially viable version.
Sectors Agriculture, Food and Drink,Education,Pharmaceuticals and Medical Biotechnology

URL https://protocolsmethods.springernature.com/users/340340-richard-barrett-jolley/posts/57971-deep-channel-taking-the-donkey-work-out-of-single-channel-analyses
 
Description Discussions with Google/Kaggle about sharing through their network.
First Year Of Impact 2019
Sector Education,Healthcare
Impact Types Societal

 
Title Deep-Channel Git 
Description Model identifies ion channels 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact Interest on social medial, Kaggle etc. 
URL https://github.com/RichardBJ/Deep-Channel
 
Description Twitter posts 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Several posts about our Deep Channel model on Twitter and these were seen by over a thousand people.
Year(s) Of Engagement Activity 2019