DNA digital data storage

DNA digital data storage refers to any scheme to store digital data in the base sequence of DNA. This technology uses artificial DNA made using commercially available oligonucleotide synthesis machines for storage and DNA sequencing machines for retrieval. This type of storage system is more compact than current magnetic tape or hard drive storage systems due to the data density of the DNA. It also has the capability for longevity, as long as the DNA is held in cold, dry and dark conditions, as is shown by the study of woolly mammoth DNA from up to 60,000 years ago, and for resistance to obsolescence, as DNA is a universal and fundamental data storage mechanism in biology. These features have led to researchers involved in their development to call this method of data storage "apocalypse-proof" because "after a hypothetical global disaster, future generations might eventually find the stores and be able to read them." [1] It is, however, a slow process, as the DNA needs to be sequenced in order to retrieve the data, and so the method is intended for uses with a low access rate such as long-term archival of large amounts of scientific data.[1][2]

History

The idea and the general considerations about the possibility of recording, storage and retrieval of information on DNA molecules were originally made by Mikhail Neiman and published in 1964–65 in the Radiotekhnika journal, USSR, and the technology may therefore be referred to as MNeimONics, while the storage device may be known as MNeimON (Mikhail Neiman OligoNucleotides).[3]

On August 16, 2012, the journal Science published research by George Church and colleagues at Harvard University, in which DNA was encoded with digital information that included an HTML draft of a 53,400 word book written by the lead researcher, eleven JPG images and one JavaScript program. Multiple copies for redundancy were added and 5.5 petabits can be stored in each cubic millimeter of DNA.[4] The researchers used a simple code where bits were mapped one-to-one with bases, which had the shortcoming that it led to long runs of the same base, the sequencing of which is error-prone. This research result showed that besides its other functions, DNA can also be another type of storage medium such as hard drives and magnetic tapes.[1]

An improved system was reported in the journal Nature in January 2013, in an article led by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues. Over five million bits of data, appearing as a speck of dust to researchers, and consisting of text files and audio files, were successfully stored and then perfectly retrieved and reproduced. Encoded information consisted of all 154 of Shakespeare's sonnets, a twenty-six-second audio clip of the "I Have a Dream" speech by Martin Luther King, the well known paper on the structure of DNA by James Watson and Francis Crick, a photograph of EBI headquarters in Hinxton, United Kingdom, and a file describing the methods behind converting the data. All the DNA files reproduced the information between 99.99% and 100% accuracy.[2] The main innovations in this research were the use of an error-correcting encoding scheme to ensure the extremely low data-loss rate, as well as the idea of encoding the data in a series of overlapping short oligonucleotides identifiable through a sequence-based indexing scheme.[1] Also, the sequences of the individual strands of DNA overlapped in such a way that each region of data was repeated four times to avoid errors. Two of these four strands were constructed backwards, also with the goal of eliminating errors.[2] The costs per megabyte were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage within about ten years.[1]

The long-term stability of data encoded in DNA was reported in February 2015, in an article by researches from ETH Zurich. By adding redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry, the researchers predict error-free information recovery after up to 1 million years at -18 °C and 2000 years if stored at 10 °C.[5][6] By adding the possibility of being able to handle errors, the research team could reduce the cost of DNA synthesis down to ~$500/MB by choosing a more error-prone DNA synthesis method. In a news article in the New Scientist the team stated that if they are able to further decrease the cost they would store an archive version of Wikipedia in DNA.

Also, a group of researchers, led by Boise State University is working toward a better way to store digital information using nucleic acid memory (NAM). They suggest that the global flash memory market is predicted to reach $30.2 billion this year, potentially growing to $80.3 billion by 2025. They estimated that by 2040, the demand for global memory will exceed the projected supply of silicon (the raw material used to store flash memory), and that nucleic acid memory has a retention time far exceeding electronic memory. They have discussed the longevity of the DNA materials through first principle theoretical calculations that is published as commentary research article.[7] According to their claims "With information retention times that range from thousands to millions of years, volumetric density 103 times greater than flash memory and energy of operation 1E+8 times less, we believe that DNA used as a memory-storage material in nucleic acid memory (NAM) products promises a viable and compelling alternative to electronic memory." and "Given exponentially increasing demands for safeguarded information worldwide, and the long retention times for DNA (ranging from thousands to millions of years), NAM can store the world's information for future generations using far less space and energy. NAM could thus be used as a time capsule for massive, infrequently accessed records in scientific, financial, governmental, historical, genealogical, personal and genetic domains.".[7]

The above methods of DNA storage had the disadvantage that the whole strand of synthetic DNA has to be sequenced in order to retrieve only one of several data sets that were previously encoded. On April 2016 researchers at the University of Washington published an encoding, storage, retrieval and decoding method that enables random access of any one of the data sets [8]

See also

References

  1. 1 2 3 4 5 Yong, E. (2013). "Synthetic double-helix faithfully stores Shakespeare's sonnets". Nature. doi:10.1038/nature.2013.12279.
  2. 1 2 3 Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; Leproust, E. M.; Sipos, B.; Birney, E. (2013). "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA". Nature. 494 (7435): 77–80. doi:10.1038/nature11875. PMC 3672958Freely accessible. PMID 23354052.
  3. https://sites.google.com/site/msneiman1905/eng
  4. Church, G. M.; Gao, Y.; Kosuri, S. (2012). "Next-Generation Digital Information Storage in DNA". Science. 337 (6102): 1628. doi:10.1126/science.1226355. PMID 22903519.
  5. Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. (2015). "Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes". Angewandte Chemie International Edition. 54 (8): 2552. doi:10.1002/anie.201411378.
  6. Jacobs, Angelika (February 13, 2015). "Data-storage for eternity". Eidgenössische Technische Hochschule (ETH) Zürich. Archived from the original on March 15, 2015. Retrieved March 15, 2015.
  7. 1 2 Zhirnov, V.; Zadegan, R. M.; Sandhu, G. S.; Church, G. M.; Hughes, W. L. (2016). "Nucleic acid memory". Nature Materials. 15 (4): 366–370. doi:10.1038/nmat4594.
  8. "A DNA-Based Archival Storage System." http://doi.acm.org/10.1145/2872362.2872397

Further reading

This article is issued from Wikipedia - version of the 12/4/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.