New twist on DNA data storage lets users preview stored files
Researchers from North Carolina State University have turned a longstanding challenge in DNA data storage into a tool, using it to offer users previews of stored data files — such as thumbnail versions of image files.
DNA data storage is an attractive technology because it has the potential to store a tremendous amount of data in a small package, it can store that data for a long time, and it does so in an energy-efficient way. However, until now, it wasn’t possible to preview the data in a file stored as DNA — if you wanted to know what a file was, you had to “open” the entire file.
“The advantage to our technique is that it is more efficient in terms of time and money,” says Kyle Tomek, lead author of a paper on the work and a Ph.D. student at NC State. “If you are not sure which file has the data you want, you don’t have to sequence all of the DNA in all of the potential files. Instead, you can sequence much smaller portions of the DNA files to serve as previews.”
Here’s a quick overview of how this works.
Users “name” their data files by attaching sequences of DNA called primer-binding sequences to the ends of DNA strands that are storing information. To identify and extract a given file, most systems use polymerase chain reaction (PCR). Specifically, they use a small DNA primer that matches the corresponding primer-binding sequence to identify the DNA strands containing the file you want. The system then uses PCR to make lots of copies of the relevant DNA strands, then sequences the entire sample. Because the process makes numerous copies of the targeted DNA strands, the signal of the targeted strands is stronger than the rest of the sample, making it possible to identify the targeted DNA sequence and read the file.
However, one challenge that DNA data storage researchers have grappled with is that if two or more files have similar file names, the PCR will inadvertently copy pieces of multiple data files. As a result, users have to give files very distinct names to avoid getting messy data. More