Global Collaboration Creates First Publicly Available Illumina HiSeq X Ten DNA Sequence Dataset

Global Collaboration Creates First Publicly Available Illumina HiSeq X Ten DNA Sequence Dataset

Australia's Garvan Institute, DNAnexus, and AllSeq sponsor sharing of sample dataset to educate the scientific community about the potential for world's most powerful DNA sequencing platform

August 07, 2014 01:21 PM Eastern Daylight Time
SYDNEY & MOUNTAIN VIEW, Calif. & LA JOLLA, Calif.--(BUSINESS WIRE)--The Garvan Institute of Medical Research, DNAnexus, and AllSeq, today announced that they are sponsoring free access to the world's first publicly available datasets generated using the Illumina HiSeq X Ten DNA sequencing platform. The goal for this project is to provide researchers with sample data that will allow them to gain a deeper understanding of what this technological advancement might mean for their work today and in the future.

"The tremendous advances in both volume and cost of whole genome sequencing using the Illumina HiSeq X Ten platform provides an exciting and practical avenue that moves us closer to the clinical translation of genomics"

The Garvan Institute's Kinghorn Centre for Clinical Genomics in Sydney, Australia, was one of the first three organizations in the world to acquire the Illumina HiSeq X Ten sequencing system. In an effort to enable the scientific community to assess data quality from an independent laboratory, they have made reference datasets available for a world first HiSeq X Ten data sharing project.

DNAnexus, an enterprise solution for genome informatics and data management, has sponsored the data storage and downloading support. The company also ran analyses to produce quality metrics to help the scientific community understand the results. AllSeq, which created the Sequencing Marketplace for matching DNA researchers and their needs with next generation sequencing service providers, arranged this data sharing endeavor as a part of its effort to educate scientists about different sequencing technologies and what they are suitable for.

"The tremendous advances in both volume and cost of whole genome sequencing using the Illumina HiSeq X Ten platform provides an exciting and practical avenue that moves us closer to the clinical translation of genomics," said Associate Professor Marcel Dinger, Head of Clinical Genomics and Genome Informatics at the Garvan Institute. "Advancing genomic medicine remains an international and highly collaborative effort and the Kinghorn Centre for Clinical Genomics is pleased to be working with DNAnexus and AllSeq to make sample data available to clinicians and researchers so that they can gain a deeper understanding of how this powerful technology may impact their work."

To develop the sample datasets, Garvan scientists used the Coriell Cell Repository NA12878 reference sample, which has been extensively analyzed by the Genome in a Bottle Consortium. Two different, high quality datasets are provided (NA12878D and NA12878J), each of which was sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 120 Gb of yield, with greater than 87 percent bases with quality greater than Q30 in just 2.8 days. Each dataset meets the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.

As the study and application of genomic data expands and proliferates, its true promise hinges on the genomics community's collective ability to manage all these data —securely, collaboratively, and efficiently. At full capacity, the Illumina HiSeq X Ten platform generates one genome every 25 minutes. According to internal data, transferring the two test genomes used in this project from the Garvan Institute in Australia into the DNAnexus system took less than 50 minutes, demonstrating that the DNAnexus platform in conjunction with Amazon's AWS Cloud can keep up with the pace of genomics. The runs were then analyzed and instantly shared with Garvan's scientific team for review. Within hours, these data were made available to the global research community to view and download.

"The DNAnexus platform was designed to be a complete solution for genomics analysis and data management, helping to accelerate basic science and clinical breakthroughs by bringing diverse teams together around ever-growing genomic datasets. The HiSeq X Ten presents a further challenge and opportunity for genomics in the production and management of genomic data, and Garvan has shown that the cloud can rise to this challenge," said Richard Daly, CEO of DNAnexus. "With the rise of consortia in the genomics community, the inevitability of the cloud and its ability to leverage scientific collaboration is near term. We are pleased to be working with the Garvan Institute and AllSeq to host the world's first HiSeq X Ten data sharing project so the scientific community can take a closer look at results from the '$1000 genome'."

The original FASTQ files, as well as analysis results (BAM and VCF files) and quality metrics, were calculated using the tools FastQC and Picard, (e.g., MarkDuplicates, CollectInsertSizeMetrics, and CollectWgsMetrics), are available at http://allseq.com/x-ten-test-data.

Those with DNAnexus accounts can also access these data via the DNAnexus platform, where users are able to copy any of the files to their own DNAnexus projects for further downstream analysis. For more information, https://dnanexus.com.

About the Garvan Institute of Medical Research and its Kinghorn Centre for Clinical Genomics

The Garvan Institute of Medical Research is one of Australia's largest medical research institutions with more than 600 scientists, students and support staff. Garvan's main research areas are: Cancer, Diabetes & Metabolism, Immunology and Inflammation, Osteoporosis and Bone Biology, and Neuroscience. Garvan's mission is to make significant contributions to medical science that will change the directions of science and medicine and have major impacts on human health. The outcome of Garvan's discoveries is the development of better methods of diagnosis, treatment, and ultimately, prevention of disease. In 2012, Garvan established Australia's first purpose-built facility for undertaking clinical-grade genome sequencing and large-scale research projects. With the support from the Kinghorn Foundation, Garvan acquired an Illumina HiSeq X Ten Sequencing System in January, 2014. The Kinghorn Centre for Clinical Genomics (KCCG) researchers undertake collaborative projects and genome-based studies to improve genome interpretation, with the ultimate aim of advancing the use of genomic information in patient care. KCCG is seeking accreditation that would ultimately allow clinicians to sequence genomes for diagnostic and therapeutic purposes. For more information please visit: http://www.garvan.org.au/

About DNAnexus

DNAnexus is powering the genomics revolution with an enterprise-level solution that combines cloud computing with advanced bioinformatics. The DNAnexus team is made up of experts in software, computational biology, and genetics who are on a mission to establish DNAnexus at the center of a growing ecosystem of scientific and clinical research, and diagnostic efforts in personalized medicine. For more information please visit: https://dnanexus.com.

About AllSeq

AllSeq has created the world's first true Sequencing Marketplace, which helps researchers fulfill their sequencing needs by matching them with the appropriate pool of sequencing providers. They provide free online tools for describing sequencing projects in an easy and systematic way, ensuring estimates are easy to compare and allowing researchers to pick the best provider (based on price, technology, turnaround time, etc). AllSeq also maintains the NGS Knowledge Bank, a neutral source of information on sequencing technologies, platforms and applications. For more information, please visit http://allseq.com.

Contacts
for DNAnexus
Colin Sanford, 203-918-4347
[email protected]