Genestorian: a web application to document and trace genetic modifications in model organism and cell line collections.

Lead Research Organisation: UNIVERSITY COLLEGE LONDON

Department Name: Genetics Evolution and Environment

Abstract

Currently, no open standards exist to unambiguously describe cloning strategies, genotypes and allele inheritance. Consequently, laboratories often store their plasmids, cell lines and strains in spreadsheets or text-based systems, which are necessarily inconsistent and differ between collections. Therefore, for curators or even members of a laboratory, it can be time-consuming or impossible to know the sequence and provenance of a plasmid or allele. Since biological knowledge bases (UniProt, model organism databases, etc.) rely on links between gene variants and phenotypes to annotate functions to gene products, the current situation limits the reusability of biological resources and the broad impact of research.

I propose to develop Genestorian, a web application to manage collections of oligonucleotides, plasmids, strains and cell lines where sequences can be traced through cloning steps up to their entry into the collection. Researchers will plan the generation of new resources from existing ones, with new sequences generated by in silico cloning. Consequently, data standardisation will occur at the planning stage, and will not be a burden at submission stages. It will be easy to query the collection, access the sequence, ancestry and progeny of resources, and export this information for the methods section of a paper or in a standard format. Standardisation will enable information exchange between laboratory collections, journals, knowledge bases and resource repositories. Therefore, Genestorian aligns with the European Union commitment to Open Science and will promote resource reusability and maximise the impact of genetic research, facilitating its reproducibility and interpretation by tracing results to specific DNA sequences.

Funded Value:

£200,511

Funded Period:

Sep 23 - Sep 25

Funder:

Horizon Europe Guarantee

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/Y024591/1

Principal Investigator:

Jurg Bahler

Research Subject:

Info. & commun. Technol. (32%)

Tools, technologies & methods (64%)

Research Topic:

Bioinformatics (36%)

Information & Knowledge Mgmt (28%)

Software Engineering (4%)

Tools for the biosciences (28%)

Organisations

UNIVERSITY COLLEGE LONDON (Lead Research Organisation)

People	ORCID iD
Jurg Bahler (Principal Investigator)
Manuel Lera Ramirez (Fellow)	http://orcid.org/0000-0002-8666-9746

Publications

Author Name

Title Publication Date Published

10 25 50

Research Databases and Models
Software and Technical Products
Engagement Activities


Title	A database of annotated plasmids in the iGEM 2024 distribution
Description	The 2024 iGEM plasmid distribution provides teams with essential genetic parts for synthetic biology projects. However, these plasmids are distributed as raw DNA sequences without detailed sequence annotations, which identify functional elements such as genes, regulatory regions, and cloning features. To address this, we created a repository that annotates these plasmids using Plannotate, a tool for automated sequence annotation. By making these annotated plasmids available, we help researchers and iGEM teams quickly interpret plasmid functions, design experiments more efficiently, and reduce errors in genetic engineering workflows.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	This resource enhances accessibility and usability of the 2024 iGEM distribution for the synthetic biology community. In addition, the annotated plasmids can be directly be accessed from the web application funded by this grant.
URL	https://github.com/manulera/annotated-igem-distribution


Title	A database of plasmids containing gateway cloning sites
Description	Data mining software project where AddGene plasmids containing Gateway Cloning sites where downloaded and categorised producing a searchable database. The data was then used to produce consensus sites for each type of Gateway site.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	Consensus sites produced from this data are used to simulate Gateway cloning in the web application funded by this grant. In addition, the site offers a portal to explore the dataset and find plasmids based on the features present in them.
URL	https://github.com/manulera/GateWayMine


Title	OpenCloning, a web application to plan and document cloning strategies
Description	OpenCloning is an Open-Source web application to plan and document cloning. Users can: 1. Import plasmid sequences from AddGene and gene sequences from NCBI. 2. Load their own sequence files. 3. Plan cloning and design primers using common techniques (Gibson, golden gate, gateway, etc.). 4. Plan strain and cell line engineering via CRISPR and homologous recombination, with use-cases not supported by SnapGene or Benchling. 5. Automate repetitive cloning and primer design using scripts or web forms. 6. Download final constructs as GenBank or FASTA files. 7. Archive the entire cloning history in an Open format and load it later. 8. Create reusable cloning templates for cloning kits.
Type Of Technology	Webtool/Application
Year Produced	2024
Open Source License?	Yes
Impact	This web application is the main output planned from this funding. It is already available for researchers to plan and document their experiments. It currently supports most cloning methods supported by proprietary alternatives, even including some methods not supported by proprietary tools. It also allows users to export the plan of their experiment in an open format, which is not supported by proprietary software.
URL	https://github.com/manulera/OpenCloning


Title	pLannotate Web API and Docker Integration
Description	Sequence annotation is a critical step in synthetic biology, helping researchers identify functional elements within DNA sequences. Plannotate is a powerful tool for automated sequence annotation, using it currently requires local installation and command-line expertise and is not easy to integrate in a pipeline or use in a production-level web application. To make Plannotate more accessible, I developed a web API that allows other applications to integrate its functionality seamlessly. Additionally, we created a containerized Docker version, ensuring easy deployment and reproducibility across different computing environments. This work lowers the barrier for researchers and developers, enabling broader adoption of automated sequence annotation in synthetic biology workflows.
Type Of Technology	Software
Year Produced	2024
Impact	This enabled the web application funded by this grant to integrate with this existing software package.
URL	https://github.com/manulera/pLannotate-api-docker


Description	Lead role in organisation of Synthetic Biology afterwork events
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Along with 4 other colleagues from UCL and Imperial College, I started a montly seminar series "London SynBio Network". The format is an afterwork event consisting of two 20 minutes talks followed by networking with drinks and snacks. The main audience is Early Career Researchers and industry members interested in Synthetic Biology. So far, we have organised 6 events with an average registration of 80 people. These events have helped me meet prospective users of the web application that this grant funds.
Year(s) Of Engagement Activity	2024,2025
URL	https://events.humanitix.com/copy-of-london-synbio-network-6


Description	Lead role in organisation of python library hackathon and monthly meetings
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	As part of my work in the web application supported by this grant, I take part in the maintenance of the python library pydna. Pydna is a python package that provides a human-readable formal descriptions of cloning and genetic assembly strategies in Python for simulation and verification. Pydna can be used as executable documentation for cloning. I have taken a leading role in activating the community of users by: - Organising monthly meetings with pro-users and developers. - Organising a one day pydna "hackathon" The typical attendance of the monthly meetings is 6 people, and 12 people participated in the hackathon. These activities have resulted in the creation of a small community of maintainers and users of the library that know each other and has resulted in an overall improvement of the library including bug fixes, documentation and better software development practices.
Year(s) Of Engagement Activity	2024
URL	https://github.com/pydna-group/pydna

Abstract

Organisations

People

ORCID iD

Publications