Thursday, January 24, 2013
New technique stores terabytes of data on DNA with 100% accuracy
As technology advances and we incorporate digital activities into our daily routine more frequently, we require an ever-increasing amount of storage space to host the data we collect. Storage is evolving at a reasonable pace, but conventional technologies have their limits and researchers are constantly looking for more powerful alternatives. One such storage medium involves holding enormous amounts of data on DNA. Now researchers have pioneered a new technique to store data on, and access data from, DNA molecules.
Last year, Harvard scientists managed to stuff 5.5 petabits (around 700 terabytes) of data onto a single gram of DNA. As we previously explained, the method used to store the data on the gram of DNA is similar to how it is stored on a standard storage device. Strands of DNA that held 96 bits of binary dataeach were synthesized, then the data could be read using a standard DNA sequencing process.
However, there are a couple of hurdles in the way of advancing writing to and reading from DNA. First, writing and reading errors are common, and are caused by repeating letters encoding onto the strands of DNA. The other prominent issue is that, currently, scientists can only create short strands of DNA, limiting the overall space with which to work.
The new method for storing data on and reading it from DNA, created by the Bioinformatics Institute (BI), consists of breaking up the data into many little fragments that overlap each other and go in either direction in order to prevent repeating letters — 117 letters in each string. Along with that specific arrangement, the coded data requires indexing information to dictate where each fragment fits into the overall data. The new technique also required a new coding method that reduced the possibility of repeating letters.
In order to test the new technique, California-based Agilent Technologies offered to store data on the strings of DNA. BI sent the Agilent team various files encoded using the aforementioned method that would reduce errors, which consisted of a .txt file of all of Shakespeare’s sonnets, a 26-second clip of Martin Luther King Jr.’s “I Have a Dream” speech, a .jpeg of the Bioinformatics Institute, a .pdf of Watson and Crick’s paper that detailed DNA structure, and a file that explains the actual encoding process being used.
Agilent downloaded those files from the internet and put the information on hundreds of thousands of strings of DNA, which resulted in something the size of a rather small piece of dust. Agilent then sent the encoded dust-like strings back to BI, where researchers managed to sequence and reconstruct the files without error.
BI researcher Nick Goldman notes that the coding technique creates results in a storage medium that can last for ten thousand years or more, and can be read by anyone so long as they have access to a machine that can read DNA and what is essentially the cipher to reconstruct the coding method.
Obviously, DNA USB drives aren’t right around the corner, as various practical issues have to be overcome first, such as, you know, not having two different research labs and all of the appropriate equipment involved in the encoding and reconstruction process. However, considering DNA will most likely never become outdated, and it has already been shown to store massive amounts of data, we can only hope significant advances in the field will be made quickly enough for us to see a DNA drive in our lifetime.
Research paper: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA