The falling cost of sequencing has created a new problem for research institutions--where to put all the data. At the Wellcome Trust Sanger Institute, 30 DNA sequencers generate a terabyte a day, meaning its 22 petabytes of storage is filling up. The time for a rethink on how it stores data has arrived.
Sanger Institute has enlisted DataDirect Networks (DDN) to help with its storage needs. DDN will provide the U.K.-based genomic research nonprofit with 10 petabytes using its SFA 12K storage engine. Sanger Institute already uses the previous-generation SFA 10K engine for some of its storage, but--with data accruing at a rapid rate--is upping the capacity and capability of its system. The institute expects its current storage of 22 petabytes--more than the capacity of all the hard drives manufactured in 1995--to double over the next three to 5 years.
The ever-expanding pool of data places demands on the organization of storage, as well as raw capacity. While Sanger Institute engineers had implemented their own Lustre file system, they have now adopted DDN's EXAScaler parallel file system appliance for half their data to cut the time they spend writing software. "High-performance storage these days is not just hardware. There's a lot of software involved, and the entire stack needs to be understood, preferably by a single organization so that you get a unified support model," Sanger Institute's acting head of scientific computing Tim Cutts told Bio-IT World.
Following adoption of the new system, Sanger Institute believes it has the scalable infrastructure to manage future leaps in its storage requirements. Having opened in 1993 to work on the Human Genome Project, the institute has had to navigate huge changes in sequencing and storage. It now runs the Cancer Genome Project, several malaria programs and a host of other initiatives, and performs more sequences in one hour than it did in its first 10 years.