DNAnexus Enables Seamless Assembly of PacBio Data for J. Craig Venter Genome

Cloud Platform Facilitates and Speeds Reference Quality De Novo Whole Genome Sequencing


DNAnexus Inc., a pioneer in cloud-based genome informatics and data management, today announced that it now offers whole genome assembly using long read sequencing data generated by the PacBio RS II. To demonstrate the new capability, DNAnexus and PacBio researchers used Daligner written by Gene Myers and PacBio's FALCON genome assembler to perform de novo genome assembly of the complete genome of J. Craig Venter, a pioneer in genome sequencing, on the DNAnexus cloud platform.

The first human reference genome was published back in 2001, and it was a multiyear effort with a price tag in the billions. Venter is the first single individual reference genome, published in 2007. DNAnexus and PacBio were able to create a higher resolution version of his genome reference, capturing important structural variant information, in a fraction of the original time and cost. More details on this announcement can be found here.

De novo genome assembly is a data and compute intensive task. Researchers performing de novo processing often find that their local infrastructure does not have the available resources or expertise necessary for the complex assemblies. The DNAnexus platform gives scientists access to massive computational resources on a cost-effective, on-demand basis and has packaged the FALCON assembler in a way that can be run without complicated installation.

The cost to assemble a genome depends on the genome size, complexity, and chosen coverage. Human genome assembly at 50 - 80x coverage can be completed in less than 48 hours and priced between $5,000 and $10,000 on the DNAnexus platform. Democratizing reference genomes via de novo assembly provides research labs of all sizes the ability to create their own reference genomes, lowering local cluster infrastructure requirements, and shortening time to results.

The ability to create a novel de novo reference genome, whether it be for more accurate remapping of human populations such as the Asian Reference Genome Project, or for crops with no existing reference, has the potential for advancing breakthroughs in scientific discovery.

"The DNAnexus platform brings the capability to perform full de novo assembly of complex genomes into the reach of more scientists," said Michael Hunkapiller, Chief Executive Officer of Pacific Biosciences. "It is essential to have full and accurate de novo human genome assemblies to facilitate our understanding of disease, and it's terrific that the DNAnexus platform allows researchers to leverage PacBio tools and massive datasets to conduct these types of pioneering research projects in the cloud."

During PacBio's Friday lunchtime workshop at the 16th annual Advances in Genome Biology and Technology, Dr. W. Richard McCombie from Cold Spring Harbor Laboratory announced the fastest human genome assembly in history was just completed that day on DNAnexus. DNAnexus ran FALCON on a breast cancer genome cell line from MSKCC, which completed in less than 21 hours. DNAnexus is in collaboration with PacBio to develop new features and improve current PacBio assembly technology in the cloud. FALCON, along with full datasets of Drosophila, Arabidopsis, and CHM13, and MSKCC can be found on the DNAnexus platform under "featured projects." Venter's PacBio data will become available pending publication.

"We are thrilled to support PacBio to further improve powerful long-read technology," said Richard Daly, CEO, DNAnexus. "We are seeing a growing presence and pace of PacBio adoption across genomic centers worldwide and are proud that the DNAnexus platform provides a high-speed data management solution for PacBio data and tools."

