Microsoft, Illumina, Twist ally to make big data small by weaving it into DNA archives

Genetic code has served as the basis of the human story for millennia. Now, a group of tech and medtech companies aims to spin digital data into DNA itself to archive information in the arrangement of its molecules.

Microsoft will work with several leaders in the field—including sequencing giant Illumina, synthetic gene weaver Twist Bioscience and computer memory mainstay Western Digital, among others—to establish a road map for the industry toward the wider use of DNA data storage.

The alliance, announced at the virtual Flash Memory Summit, aims to set technology and formatting standards with the goal of building interoperable commercial systems capable of housing the exponential amounts of data expected to be generated in the future.

“DNA is an incredible molecule that, by its very nature, provides ultra-high-density storage for thousands of years,” Twist co-founder and CEO Emily Leproust said. “By joining with other technology leaders to develop a common framework for commercial implementation, we drive a shared vision to build this new market solution for digital storage.”

RELATED: Twist Bioscience secures $140M plus a slew of new DNA partnerships

To store data in DNA, a file is converted from its binary sequence of ones and zeros into the four labeled compounds that make up our genome: A’s, C’s, T’s and G’s. The information is then broken up into short segments of 200 to 300 genetic bases, which are tagged to an index, synthesized and stored.

The companies estimate that 10 full-length movies could be written into DNA molecules and packed to a volume the size of a grain of salt—and could last far longer than anyone who may want to watch them when stored under the proper conditions in capsules or glass beads.

In addition, by mirroring biologic processes, DNA data can be cheaply and quickly duplicated and read.

RELATED: Illumina launches whole-genome analysis software to help identify rare diseases

“At Microsoft Research, we proactively address the future challenges of technology, with sustainability in mind,” said Karin Strauss, a senior principal research manager at Microsoft. 

“In collaboration with University of Washington, we have demonstrated a fully automated end-to-end system capable of storing and retrieving data from DNA, and we have separately stored 1GB of data in DNA synthesized by Twist and recovered data from it,” Strauss said. “We’re encouraged by the potential for more sustainable data storage with DNA and look forward to collaborating with others in the industry to explore early commercialization of this technology.”

Twist Bioscience, Illumina, Western Digital and Microsoft will serve as founding members of the DNA Data Storage Alliance and are joined by Ansa Biotechnologies, Catalog, DNA Script, Imec;, Iridia, Molecular Assemblies and the Molecular Information Systems Lab at the University of Washington as well as the Claude Nobs Foundation, which works to preserve audiovisual recordings through the Montreux Jazz Digital Project alongside the Swiss Federal Institute of Technology’s locations in Lausanne and Zurich.

“There is an unmet need for a new long-term archival storage medium that keeps up with the rate of digital data growth,” said Steffen Hellmold, Western Digital’s vice president of corporate strategic initiatives. 

“We estimate that almost half of the data storage solutions shipped in 2030 will be used to archive data as the overall temperature of data is cooling down,” said Hellmold, referring to how often the data is accessed for use. “We are committed to providing a full portfolio of storage solutions addressing the demand for hot, warm and cold storage.”