Amazon hungers for genomics data after NIH pact

Amazon's ($AMZN) late March deal with Uncle Sam to host a huge dataset of genomes has made waves in the technology world, as it marks yet another advance in the use of cloud computing to corral massive amounts of scientific data. In a Wired article, players in the scientific computing game point out the multiple ways the genomics revolution and the emergence of cloud tech complement each other.

Amazon has agreed to host sequencing data on more than 1,700 genomes from the global 1000 Genomes Project, which gobbles up some 200 terabytes of storage, using its S3 storage services, Wired reports. Amazon plans to offer the data to genomics researchers with S3 accounts at no additional charge. However, it could make money from R&D groups that opt to run research applications with Amazon's public cloud service and bank their own data with the company.

As noted previously, some labs lack the infrastructure to house and manage major genomic datasets, and the analysis required to understand the huge stockpiles of DNA data can overwhelm their own computers. Cloud providers such as Amazon and Google ($GOOG) have jumped at opportunities to house major genomic datasets as the companies pursue new business from life sciences customers. The companies have had some success: Eli Lilly ($LLY) has tapped Amazon's cloud, while Roche ($RHHBY) has plans to adopt Google's cloud-based apps company-wide.

For the NIH, partnering with Amazon supports its efforts to make the data publicly available. Scientists can pick up the information and run with experiments. The hope is that streamlining access will aid the translation of the data into new therapies and diagnostics.

