Amazon, Baylor and DNAnexus team up for huge sequencing project

The scale of genomics sequencing projects continues to grow rapidly. Just as the 1,000 Genomes Project once dwarfed the Human Genome Project, now an initiative involving Amazon ($AMZN), Baylor College of Medicine and DNAnexus has surpassed both of them, generating 430 terabytes of data in the process.

Amazon, Baylor and DNAnexus came together to contribute to the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, an initiative to understand how human genetics relates to heart disease and aging. More than 300 researchers at five institutions are working on CHARGE. In total, the collaborators have sequenced 3,751 whole genomes and 10,771 exomes. This required 2.4 million core-hours of computational time and 860 terabytes of storage. The 300 collaborators can now perform downstream analyses on the 430 terabytes of results.

Probing the massive pile of data could give new insights into the genetic causes of aging--an area Google ($GOOG) is reportedly investigating through its Calico spinoff--but the scale of the project creates its own challenge. Working in the cloud helped resolve some of the issues. "Having access to this much data was very unique. Many institutions do not have the local compute resources and infrastructure to support large scale analysis projects like this one, so we were lucky to come together with DNAnexus and Amazon Web Services to make this project possible," Jeffrey Reid, who led Baylor's participation in the project, said.

Baylor used DNAnexus' cloud platform to power its semi-automated, modular tools for analyzing next-generation sequencing data. The project--which the collaborators claim is largest ever cloud-based genomic analysis initiative--was hosted by Amazon Web Services. Hosting the project in the cloud and giving researchers access to the algorithms and industry-recognized tools on DNAnexus' platform lessened the infrastructure requirements at individual institutions.

- read Baylor's release
- check out DNAnexus' release
- here's GEN's take