Amazon ($AMZN) has made The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) PanCancer data sets freely available through its cloud platform. The addition of the data sets continues Amazon's push to differentiate its cloud offering through the availability of data.
TCGA and ICGC join the 1000 Genomes Project on the list of genomics initiatives on Amazon Web Services (AWS) Public Data Sets program. The new additions give users of AWS access to genomic, transcriptomic and epigenomic data from thousands of people, potentially opening up opportunities for researchers with no links to TCGA or ICGC to turn their resources into advances in our knowledge of cancer. And by having AWS take care of the back-end tasks, the team behind the project thinks it can free researchers from some of the drudgery involved with analyzing genome data.
"Compiling the data, organizing the data, analyzing the data, making the data available to all researchers--these are fundamental to making further progress in cancer genome research," Peter Campbell, head of cancer genetics and genomics at the Wellcome Trust Sanger Institute, said in a statement. Campbell, who is helping to lead the ICGC-TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project, sees value in shifting these tasks to the cloud. "We are excited at the possibilities of working with innovative cloud-based computing systems to achieve these advances," he said.
For Amazon, the addition of the data sets marks another advance in its attempt to become the go-to cloud platform in genomics. With the falling cost of data storage leaving Amazon with little room to undercut rivals such as Google ($GOOG), the availability of landmark data sets is one way in which it can differentiate its platform, or at least stop its competitors from gaining an edge. Amazon and Google both host multiple public data sets, some of which are available through both their platforms.
- read the ICGC release
- and Amazon's statement