Enhancing automated, reproducible analysis workflows and data curation for extracellular neural recordings with SpikeInterface

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

How our brains give rise to cognition, thought and behaviour is one of the great unanswered questions in science today. Answering this question will impact fields ranging from medicine to artificial intelligence. The last decade has brought major innovations in technologies to precisely record the activity of large numbers of neurons in the brain. These advances allow, for the first time, the systematic study of the role of interactions between neurons in neural circuits and between brain areas at a large scale. However, the size and complexity of the resulting data sets is considerable and requires new approaches and methods for their analysis and interpretation.

As in other research fields such as physics and genomics, the availability of new data has led to a community effort to tackle data analysis. Numerous algorithms and software tools are available to researchers, yet their effective use in neuroscience labs still requires technical expertise that is not always easily available. Moreover, labs typically implement hand-crafted analysis workflows that are not portable and may be difficult to maintain as the original developers move on. This, in turn, makes it difficult to integrate new developments and can limit reproducibility as results may depend on specific but undocumented elements of analysis workflows. To address this, we developed SpikeInterface, a software framework to unify access to data and major existing algorithms and tools for data processing and analysis. This successful open-source project has been used to support analysis in data-intensive research studies and has received numerous contributions from the research community.

Here we propose to build on SpikeInterface to address two major open problems: 1) How to automate data curation; and 2) How to abstract workflows so they can be reproducible and designable without specialist programming expertise. Data curation is an essential part of analysis workflows as existing algorithms, for instance to isolate the activity of single neurons, are imperfect. Usually this is a time-consuming, manual process that severely limits the data volume a lab can realistically work with. To improve throughput and reproducibility, we will develop novel machine-learning approaches to automate data curation. To improve reproducibility and accessibility of these and the many other methods in SpikeInterface, we will add functionality to handle abstract representations of workflows. This will allow the full provenance of analysis workflows to be easily documented, and will enable data analysis to be created and run without the need to write program code. This, in turn, will be the basis for a web browser-based user interface to design, test and execute workflows. With flexible data access and pipelining SpikeInterface already implements, this will allow de-centralised solutions, for instance an analysis may be run locally, on a remote machine, or on a cloud-based service. Furthermore, the project team will support the research community in adoption and use of these tools, and will continue to maintain and improve the SpikeInterface software. Together this effort will make cutting-edge analysis methods available to the thousands of neuroscience labs now adopting large scale recording technologies, it will enable collaborative analysis of large data sets, and simplify sharing and re-use of valuable data sets for further discovery.

Technical Summary

Technologies to record electrical activity from many neurons, across multiple brain areas and during behaviour, are rapidly advancing and pose a formidable challenge to data analysis and interpretation. This project focuses on the extraction of single neuron activity through spike sorting, the starting point of most interpretative analysis. Various computational tools already exist for such tasks, but these often lack automatisation, standardisation and interoperability. Here we will create new tools that address (1) data curation as a major analysis bottleneck as it requires manual intervention, and (2) challenges of reproducibility and the difficulty for labs without technical expertise to adopt cutting-edge analysis methods. We will build on the open-source SpikeInterface project, which unifies data formats and the access to the existing solutions into a single Python software framework and is already in use in many labs world-wide. To address the curation bottleneck, we will develop machine learning algorithms to predict single neuron isolation quality from a combination of complementary metrics developed for this purpose. In a second approach, we will develop spike sorting algorithms that directly report uncertainty estimates. Both will allow users to accept and reject sorted neurons without manual intervention. Combining such methods with spike sorters and data processing in workflows currently still requires programming expertise and can be challenging to reproduce between labs. Through addition of a workflow abstraction method, we will remove this requirement and build a web-browser operated user interface. This will enable flexible construction of reproducible workflows and execution on different computers or on a cloud-based service, and collaborative working on the same data. Together this will remove major roadblocks labs face when adopting large scale recording technologies, and enhance reproducible and collaborative research and data sharing.

Publications

10 25 50