SyntheticDNAStore. Synthetic biology innovation around design DNA molecules

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Wellcome Trust Genome Campus

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This is a high risk synthetic biology proposal to further the aims of using DNA as a viable long-term Digital storage media.
This builds upon previous published work. We have four specific aims: (1) to explore the copying behaviour of a number of different schemes (PCR based, Rolling circle and RepA based schemes), quantifying both per-base error rates and (more importantly for this proposal), per-fragment replication rates. (2) To design information coding schemes for DNA which can maximise the amount of information encoded in DNA whilst also optimised for low error rate and even copying behaviour. (3) To develop a physical indexing scheme for Digital DNA storage, allowing separate compartments to independently retrieved and (4) To create a mid-scale (~Terabyte) level storage of culturally important digital works.

Planned Impact

This proposal has impacts on Academic, Economic and Societal levels.


The Academic impacts include: (1) the development of a compelling piece of synthetic biology design, showcasing the blend between engineering and biological components. (2) the extensive exploration of the error rate and amplification rate of DNA copying enzymes at a level of precision suitable for engineering design. We will publish this work in appropriate journals and provide all the data backing up this information and (3) the training of two individuals in synthetic biology, one from the experimental side and one from the theoretical/electrical engineering side.

The Economic impact in the long term will be the provision of a robust, scaleable long-horizon digital archiving mechanism. This will allow both commercial and public sector organisations to effectively and with low management costs to archive information over multiple decades (the theoretical horizon of this information is in millennia, but the commerically important component of this is in the 20-100 year time scale). To achieve this we will need to commercialise this technology, either via a technology orientated small company or via a forward thinking large company; in either case this will attract R&D investment into the UK.

The Societal impact will be via a compelling piece of public engagement. We propose to archive a selection of culturally important digital works, and the technology truly has a multi-millennia horizon in terms of longevity. To choose the culturally important digital works we will engage with Arts and Humanities colleagues and the general public, potential via high profile TV, Radio or festival components. This will contribute to the public understanding of science.

Publications

10 25 50
 
Title Exhibition Sample for Historisches Museum Frankfurt 
Description As a result of the considerable interest this project received, the Historisches Museum Frankfurt (Germany) approached us to inquire as to whether we could provide a sample which they could display as part of an upcoming exhibition entitled "Forgetting - Why we don't remember everything". We prepared and provided them with a copy of the DNA sequences which formed the original DNA storage library published in the 2013 Goldman et al., Nature publication which led to this award. The exhibition will run from 07/03/2019 until 14/07/2019. 
Type Of Art Artistic/Creative Exhibition 
Year Produced 2019 
Impact This exhibition has yet to open. Once opened to the public we expect it will further promote interest in science. 
URL https://historisches-museum-frankfurt.de/de/vergessen
 
Title Music of the Spheres 
Description Charlotte Jarvis is a UK-based artist interested in modern science, interaction with scientists, and engaging the public in controversial/radical ideas drawn from this science. We have an ongoing collaboration in Charlotte's "Music of the Spheres" project. During the course of this BBSRC award we completed the development stages of the performance and installation versions of the project. The opening (with live musical performance, oral presentation of the artistic and scientific inspirations, static installation of artefacts relating to the design of the project) took place in London in June 2015. 
Type Of Art Artistic/Creative Exhibition 
Year Produced 2015 
Impact Music of the Spheres has had a number of live performances and installations since June 2015. These are listed in more detail at the URL listed below and at http://www.artforeating.co.uk/restaurant/index.php?/project/mots-events-schedule/ Of particular note was the performance at the Festival of Genomics (http://www.festivalofgenomicslondon.com/) on 20/1/16. Update, March 2019: "Music of the Spheres" continues to be a successful project for Charlotte Jarvis. Since March 2018 it has been exhibited in China and Argentina, and later in 2019 it will be displayed in Hartlepool, UK. 
URL http://www.artforeating.co.uk/restaurant/index.php?/project/music-of-the-spheres/
 
Description Our 2018 publication in F1000Research documents what we learnt about difficulties of cross-discipline communication between moleculare biologists, information theoreticians, etc. We are now distributing this paper to colleagues, other researchers, outreach events, funders etc. in order to permit simpler communication between scientists.
Exploitation Route The F1000Research paper includes a public domain glossary of terminology that we have left open to comments and additions from the scientific community. We hope it will be widely accepted in the DNA-storage community and further developed and extended. It may in future form the basis for defining standards for scientific communication, in the same way the MIAME standards have in microarray analysis, etc. Once published our findings regarding noise in the DNA synthesis, amplification and sequencing channel will provide other researchers with information on how to best adapt ending methodology to appropriately optimise and protect data storage in DNA. Throughout the project we have performed numerous public engagement which has inspired new audiences to become interested in science.
Sectors Chemicals,Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections

 
Description There is continued interest in the commercial possibilities of NAM (nucleic acid memory) and DNA-storage. At present this is concentrated on the IP created before the start of this award, but as that develops (currently underway) the grant awardees' institutions (Wellcome Trust Sanger Institute, EMBL-European Bioinformatics Institute) will follow their established procedures for any commercialization of knowledge created under this award. UPDATE Feb 2018: We are still receiving enquiries from interested investors. We have not yet been able to establish investment, but continue to pursue leads when appropriate, for example in Mar 2018 we will start discusisons with a global (China-based) telecomms company. UPDATE March 2019: Our own attempts to found a company have not been successful, and we are not actively pursuing this at the moment. Numerous other companies are now pursuing DNA-storage targets, particularly around Cambridge, UK (e.g. Nuclera, Evonetix work on DNA synthesis), Boston, USA (e.g. CATALOG work on start-to-finish DNA-storage equipment; Conagen Inc.), and the west coast of the USA (e.g. Twist Biosciences now have a dedicated DNA-storage team; Ansa Technologies in Berkeley, CA, are working on DNA synthesis) and others. UPDATEA March 2020: Our original patent (pre-grant award) has been licensed, and we are now in discussions with the licensee who may set up a small group near to Hinxton or elsewhere to develop a team to exploit this IP. We hope to collaborate or consult with this team when it is established. This grant has been important to building necessary skills and training people in these areas.
First Year Of Impact 2015
Sector Digital/Communication/Information Technologies (including Software),Other
Impact Types Economic

 
Description Organisation of major meeting on DNA Storage - Banbury Center, USA
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL https://www.cshl.edu/wp-content/uploads/2019/03/DNA19_Agenda_revised.pdf
 
Description Organisation of major meeting on DNA Storage - Banbury Center, USA
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL https://www.cshl.edu/wp-content/uploads/2018/10/DNA18_Program.pdf
 
Title Coding and information theory for the DNA storage channel 
Description Develop and refine mathematical tools to measure the limits of data storage on DNA based on the measured model parameters. Develop specific coding techniques that reap the full benefits of the storage medium, maximising the amount of data that can be stored while protecting against packet (oligo) loss and lower level synthesis, amplification and sequencing errors that cause recovered sequences to differ from those originally stored. Update February 2018: a two-stage coding strategy has been adopted. Reed-Solomon codes have been designed and adapted to act as the outer code in order to guarantee the required error rates by making up for erased (lost) DNA molecules. For the inner code, there is ongoing work to develop novel codes dealing with insertions and deletions in synthesis and sequencing and recover some data integrity on affected DNA molecules. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? No  
Impact Our communication on this method (mostly in the form of invited and contributed talks) has sparked a new interest in the information theory of DNA storage which has inspired several research groups to enter this new research area. 
 
Title DoNAld 
Description A DNA short sequence generating software to house the controlled generation of test designs for measuring the probabilistic properties of the storage medium. 
Type Of Material Technology assay or reagent 
Provided To Others? No  
Impact This has been used to generate test sequences that have been shared with synthesis companies and used to generate test sequences. We are currently in the evaluation phase of the resulting measurements. 
 
Title Modelling noise in the DNA synthesis - amplification - sequencing channel 
Description To directly address the challenges of errors in DNA storage applications we are in the process of modelling noise in the DNA synthesis > amplification > sequencing channel through which the DNA physically passes prior to decoding. This is necessary to tailor encoding, such as error correction, appropriately based on the error profiles. To achieve this we have designed synthetic DNA sequences and standardised wet lab experiments which will help define what can/can't be synthesised, amplified and sequenced. This will be precisely defined using novel analysis/alignment tools which we are currently developing. Update February 2018: the results of this measurement are in the process of being analysed and are currently summarised in the form of an internal report that is being updated as further insights are gained. So far, we have observed that error rates vary by location on the synthesis chip, location within a sequence, and we are in the process of quantifying their probability and bursty characteristics. These figures will play a major role in the design of coding methods and inform the overall data density and reliability achievable in DNA storage systems. 
Type Of Material Technology assay or reagent 
Year Produced 2017 
Provided To Others? No  
Impact Using our understanding of the channel noise we will then tailor our coding strategies to maximise information storage in DNA whilst minimising and protecting against predicted error rates. Of interest to the wider synthetic biology community is our future plans to use our expertise to profile a number of DNA polymerases and isothermal DNA amplification techniques. Update 2018: Analysis is nearing completion. The findings, which will be detailed in a publication prepared for submission in 2019, will be used to tailor the design of coding methods and inform the overall data density and reliability achievable in DNA storage systems. 
 
Title Molecular Random Access Memory (RAM) 
Description This aspect of the project addresses the need to implement innovative scalable molecular random-access memory (RAM). This will allow the extraction and sequencing of select DNA sequences which together decode to a file(s). Our approach to RAM will be to use a 'priming region' containing multiple overlapping PCR primer sequences, and restriction enzymes (RE) which will facilitate library preparation. Prior to the end of the award in August 2018 we designed and began to perform wet lab experiments which, as of March 2019, have demonstrated proof of principle. In 2019 this work will continue and a small scale realistic experiment will be performed prior to a large scale experiment. This will feature multiple encoded files and extraction and will be detailed in a publication for submission in 2019. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? No  
Impact No impact, still in progress. 
 
Title Profiling mesophilic and thermophilic DNA polymerases 
Description To highlight the utility of synthetic biology based approaches to define amplification error/bias of interest to the wider community we are applying our expertise to broadly characterise a number of mesophilic (isothermal) and thermophilic DNA polymerases. The DNA polymerase which will be characterised have been selected and optimised for use in our experimental set up. We are awaiting further development of our analysis approach as detailed in 'Modelling noise in DNA synthesis, amplification, sequencing channel' in the Research Tools and Methods section before proceeding. This is too ensure that our current experimental set up will allow detection of errors/bias as we predict. Update 2018: It is unlikely that we will be able to determine the error profiles of the selected DNA polymerases in sufficient detail. The error rates of many polymerases used for DNA sequencing preparation are extremely low making their detection difficult. During our analysis of the experimental work designed to model channel noise in DNA synthesis, amplification and sequencing we struggled to detect and distinguish errors introduced by PCR errors - once this analysis is complete we will make a decisions as to whether we are able to proceed with the work to profile DNA polymerases. Update, March 2019: It is unlikely that we will be able to determine the error profiles of the selected DNA polymerases in sufficient detail. The error rates of many polymerases used for DNA sequencing preparation are extremely low, making their detection difficult. During our analysis of the experimental work designed to model channel noise in DNA synthesis, amplification and sequencing we struggled to detect and distinguish errors introduced by PCR errors - once this analysis is complete we will make a decisions as to whether we are able to proceed with the work to profile DNA polymerases. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? No  
Impact Still in progress. No impact as of yet. 
 
Description Collaboration with Agilent Technologies 
Organisation Agilent Technologies
Country United States 
Sector Private 
PI Contribution We are beginning to explore, both theoretically and experimentally, how synthesis may be modified specifically for DNA information storage applications. We are in talks with Agilent regarding experiments which will mimic large scale low fidelity DNA synthesis on their propriety platform to see if we can characterise the resulting errors in detail and effectively deal with them.
Collaborator Contribution Agilent will synthesise the DNA which will be shipped to us for subsequent analysis. Agilent will help us to design DNA sequences which will further explore what can/can't be synthesised using their propriety platform.
Impact Update March 2019: This collaboration was not followed up as after consideration it was decided that the findings would be similar to those being explored in our collaboration with Twist Biosciences.
Start Year 2017
 
Description Collaboration with CustomArray 
Organisation CustomArray
Country United States 
Sector Private 
PI Contribution Investigating how CustomArrays proprietary DNA synthesis platform could be potentially modified to reduce the cost of DNA synthesis to make DNA storage applications more feasible in the future.
Collaborator Contribution CustomArray suggested the modifications which could be made to their DNA synthesis chemistry which could either reduced costs or increase yields of error prone DNA (we are currently working on methods to deal with errors see other sections). CustomArray will synthesise the DNA which will be shipped to us for subsequent analysis.
Impact Update March 2019: This collaboration was not followed up as CustomArray was acquired by another company.
Start Year 2016
 
Description Collaboration with Twist Bioscience 
Organisation Twist Bioscience
Country United States 
Sector Private 
PI Contribution We are investigating the error rates of Twist Bioscience's proprietary DNA synthesis methods, including analysing how these change with different process parameters. The target is to understand how to optimise the amount of information that can be reliably stored in DNA relative to the cost of synthesis. We are still devising the precise experiment at the time of writing (29/2/16). We will be performing DNA sequencing and data analysis, primarily the comparison of DNA fragments read with those designed. Update, March 2017: The experiment has been devised, DNA fragments devised, synthesized (by Twist) and shipped. Initial lab work has started at the Sanger Insitute in our partner lab, and initial sequencing results are under analysis at EBI. Update, February 2018: DNA supplied by Twist has now been sequenced under multiple conditions at Wellcome Sanger Institute. Resulting data is currently undergoing analysis as described under "Modelling noise in DNA synthesis, amplification, sequencing channel" in the Research Tools and Methods section. Update, March 2019: Lab work completed; several sequencing runs performed. Analysis of data revealed unexpected DNA synthesis artefacts. These are now well-understood and we are proceeding with analyses and write-up. First presentation of this work at a conference (took place after end of grant) was well-received and a paper will follow in Q2-Q3 2019.
Collaborator Contribution Twist Bioscience will be synthesising DNA fragments according to our designs and shipping them to us. They will also perform basic QC checks, and will in addition sequence the fragments themselves, for their own purposes and to give us a baseline comparison. Update, March 2017: Twist Biosciences synthesised our DNA designs (part of our method detailed in the research tools & methods outcome) at three modes of fidelity. The synthesised DNA was shipped to us for subsequent analysis. Update, March 2019: To assist with analysis, Twist Bioscience provided information regarding the conditions used to synthesise the DNA and the synthesis chip layout.
Impact The synthesised DNA was used to standardise wet lab experiments and sequencing data is currently being used to model noise in the DNA synthesis, amplification and sequencing channel as detailed in the Research Tools & Methods outcome. A presentation on this work was made (by Nick Goldman) at the 2019 Banbury Centre meeting on DNA Storage, and was well-received. We were encouraged to submit a paper on this by multiple attendees.
Start Year 2013
 
Description BBC Science feature on 'How DNA can be used to store computer data' discussing information storage in future 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Nick Goldman was interviewed for a BBC Science feature entitled "How DNA can be used to store computer data", available online on March 23 2018. The program appears to have been well-received by those who heard it and discussed it with Nick Goldman. Nick Goldman was invited to appear on BBC Breakfast and BBC Look East (TV) following the interest this feature generated. The BBC Science piece was also featured on "Gogglebox" (TV) at the time (reaching an audience of c. 3.5 million).
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/news/av/science-environment-43395686/how-dna-can-be-used-to-store-computer-dat...
 
Description BBC radio appearance on Inside Science 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Interview used on BBC R4 programme "Inside Science", reporting the "Bitcoin challenge" of a Bitcoin incoded in DNA, which was won (less than 1 week before the competition ended) by a Belgian PhD student. I was the person who set the challenge, encoding the Bitcoin and distributing samples at presentations and other events.
Year(s) Of Engagement Activity 2018
URL http://www.bbc.co.uk/programmes/b09nrsfv
 
Description BBC radio appearance on programme 'The Far Future', discussing information storage into the far future 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact I was interviewed and the recording used as part of the program "The Far Future", broadcast on BBC R4 on 2 Jan 2018. The program appears to have been well-received by those who heard it and discussed it with me.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/programmes/b09k6jdj
 
Description Banbury Center meeting on DNA-storage 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I was co-organiser of a Banbury Center (science think tank based at Cold Spring Harbor Labs, USA) workshop on DNA-storage. We have collected experts in relevant scientific fields, journal editors, funders etc., and will use this discuss future directions in DNA-storage.
Year(s) Of Engagement Activity 2018
URL https://www.cshl.edu/education/banbury/
 
Description Masterclass at Robinson College, Cambridge, for Y10-13 school children 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact A presentation of DNA storage to secondary school pupils who are considering to study engineering, where the mechanism of DNA storage was illustrated in an activity involving Lego.
Year(s) Of Engagement Activity 2017
 
Description Naked Scientist Journalist Interview 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Podcast and article entitled "Using DNA to store big data" for the general public interested in science. The podcast and article were part of a "Big Data, Big Deal?" feature: http://www.thenakedscientists.com/HTML/podcasts/naked-scientists/show/20151117/
Year(s) Of Engagement Activity 2015
URL http://www.thenakedscientists.com/HTML/interviews/interview/1001543/
 
Description Pint of Science - presentation and activity 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Pint of Science is a series of presentations of science for the general public that takes places in pubs across the country. Goldman and Sayir presented the principles of DNA storage in a panel discussion format, followed by a guided activity where participants were led through the steps of encoding information and retrieving information from DNA in small groups, using pretend DNA made out of lego cubes.
Year(s) Of Engagement Activity 2017
URL https://pintofscience.co.uk/event/technology-and-the-world-around-us
 
Description Short CNN television feature 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact CNN filmed a short television feature on the potential of using DNA as a storage medium. Team members were interviewed and filmed. Entitled 'Could your future hard-drive be made of DNA' the featured aired on the CNN television within the programme 'Make Create Innovate'. The feature is also available to watch online.

This stimulated further interest in the project from the general public.
Year(s) Of Engagement Activity 2016
URL http://edition.cnn.com/videos/tech/2016/09/26/make-create-innovate-data-storage-in-dna.cnn
 
Description TV piece on Faszination Wissen on German TV 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact TV piece entitled 'Data storage of the future' on Faszination Wissen which aired on German TV.
Year(s) Of Engagement Activity 2017
URL http://br.de/s/2fgo7XR
 
Description presentation at BCX Disrupt conference, Johannesburg 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presentation of DNA-storage project to business management professionals, primarily based around Johannesburg, RSA. Q&A session afterwards, as well numerous informal discussions in the breaks between talk sessions. Favourable feedback regarding social media impact also. Topic and presentation were covered in 10-minute piece on national TV current affairs/investigative journalism programme. Other speakers at the meeting included will.i.am and Malcolm Gladwell.
Year(s) Of Engagement Activity 2017
URL https://bcxdisrupt.com/