Complete Genomics Adds 29 High-Coverage, Complete Human Genome Sequencing Datasets to its Public Genomic Repository
Global research community gains access to sequencing data from a three-generation pedigree and additional ethnically diverse samples.
MOUNTAIN VIEW, Calif. - April 6, 2011 - Complete Genomics Inc. (NASDAQ: GNOM), a complete human genome sequencing company that has developed and commercialized an innovative DNA sequencing platform, announced today that it has added 29 high-coverage, complete human genome sequences to its public genomic repository. Combined with the 40 genome datasets that Complete Genomics released on Feb. 3, 2011, this provides the research community with a public data repository of 69 complete human genome datasets.
The 69 genomes included in this public dataset were drawn from two resources housed at the Coriell Institute for Medical Research: the National Institute of General Medical Sciences (NIGMS) Human Genetic Repository and the NHGRI Sample Repository for Human Genetic Research. These 29 newly available datasets include the previously announced Puerto Rican trio from the NHGRI Repository and a 17-member, three-generation CEPH pedigree from the NIGMS Repository, as well as nine ethnically diverse samples from the NHGRI Repository, representing Tuscans from Italy, Gujarati Indians from Houston, Texas, and Maasai from Kinyawa, Kenya. Most of these samples have been previously analyzed as part of the International HapMap Project or the 1000 Genomes Project.
"We are delighted to see how quickly our public multi-genome repository has been embraced by the global research community," said Dr. Clifford Reid, chairman, president and CEO of Complete Genomics. "Since we launched this initiative on Feb. 3 at Advances in Genome Biology and Technology (AGBT), 30 terabytes of data have been downloaded from our website by more than 550 unique IP addresses. As this data is also available on Bionimbus and DNAnexus mirror sites, the total number of downloads is likely much higher."
This new dataset continues to meet Complete Genomics' extremely high quality data standards. All the genomes sequenced in this dataset have a median genome call rate of 97.1 percent and a median exome call rate of 96.3 percent.
"We are impressed with the completeness and quality of Complete Genomics' human genome sequencing data and have used a portion of it in our research," said Atul Butte, chief, Division of Systems Medicine, Department of Pediatrics at Stanford University School of Medicine. "This is certainly a practical and valuable way in which Complete Genomics has shared its expertise with the community. Indeed, my students are already learning how to analyze human genome sequences through these data releases."
Because many researchers are interested to learn how Complete Genomics' data compares with other publicly available datasets, the company has performed the following comparisons. Complete Genomics was able to make a call at a median of 99.34 percent of the HapMap 1 SNP loci genotyped using the Illumina Infinium assay; of these calls, the SNP concordance rate was 99.94 percent. Complete Genomics also was able to make a call at a median of 99.45 percent of the HapMap 3 SNP loci; of these, 99.73 percent were called concordantly. Complete Genomics was able to make a call at a median of 97.28 percent of the 1000 Genomes Project Pilot 2 (high-depth sequencing of trios) SNP loci; 99.70 percent were called concordantly.
Complete Genomics is posting variant reports for each sample, which include SNPs, insertions/deletions, copy number variations and structural variations, and also is providing the read alignments supporting those calls, as well as coverage information and quality scores. This dataset is available on both human genome reference build 36 and 37.
This new data is available for download from either the Complete Genomics website at http://www.completegenomics.com/sequence-data/download-data or the Bionimbus mirror site at http://www.bionimbus.org under "public data". DNAnexus customers can also gain access to this dataset through that cloud-based platform.
About Complete Genomics
Complete Genomics is a complete human genome sequencing company that has developed and commercialized an innovative DNA sequencing platform. The Complete Genomics Analysis Platform (CGATM Platform) combines Complete Genomics' proprietary human genome sequencing technology with our advanced informatics and data management software. We offer this solution as an innovative, end-to-end, outsourced service, CGATM Service, and provide customers with data that is immediately ready to be used for genome-based research. Additional information can be found at http://www.completegenomics.com.
Forward Looking Statements
Certain statements in this press release, including statements relating to our expectations regarding the growth in downloads from our public genomics repository, are forward looking statements that are subject to risks and uncertainties. Readers are cautioned that these forward looking statements are based on management's current expectations, and actual results may differ materially from those projected. The following factors, without limitation, could cause actual results to differ materially from those in the forward looking statements: our limited operating history and any failure of the demand for complete human genome sequencing data to grow. More information on potential factors that could affect our monthly genome sequencing capacity is included in our Securities and Exchange Commission filings and reports, including the risks identified under the section captioned "Risk Factors" in our Quarterly Report on Form 10-Q filed on December 22, 2010. We disclaim any obligation to update information contained in these forward looking statements, whether as a result of new information, future events or otherwise.