U.S. team taps supercomputer to accelerate genome data analysis

As the cost of sequencing a whole human genome has edged downward toward the fabled $1,000 mark, some observers have become increasingly concerned about how much time and money it will take to analyze the data. To clear the potential bottleneck, U.S. researchers have applied a supercomputer to the task.

Writing in the journal Bioinformatics, a U.S. National Institutes of Health-funded team based at the University of Chicago describe their use of the Beagle supercomputer to align and analyze raw sequencing data. Beagle is a 150-teraflops, 18,000-core supercomputer with a 600-terabyte hard drive. Applying this computing grunt to sequencing data allowed the researchers to align and call variants on 240 whole genomes in 50 hours. The throughput was achieved by adapting the supercomputer for concurrent multiple genome analysis.

The authors predict the approach will save time and money. "Improving analysis through both speed and accuracy reduces the price per genome. With this approach, the price for analyzing an entire genome is less than the cost of looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is to get the cost of analysis down into that range," study author and University of Chicago professor Elizabeth McNally said in a statement.

Illumina ($ILMN) recently claimed to have passed the $1,000 genome sequencing barrier with its HiSeq X Ten system. The output of this technology--which can sequence 340 whole genomes a week--emphasizes the need for high-throughput analysis. Beagle is one way to enable such analysis, but even more powerful computers are now available. In November 2010, Beagle was the 54th most powerful supercomputer. Within three years it slipped to 455th.

Beagle's rapid slide downward in the rankings shows how quickly supercomputing is advancing. The top-ranked computer as of November 2013 is a 54,900-teraflops, 3.1 million-core machine at China's National Supercomputer Center. While such machines represent the cutting edge, much less powerful computers are still useful for life science research. The Centre for Development of Advanced Computing in India this week unveiled its plans to use a 10-teraflops system for research into malaria, breast cancer and other diseases.

- here's the Bioinformatics abstract
- and accompanying press release
- check out the India news

Suggested Articles

St. Jude, Microsoft and DNAnexus have created a data-sharing and analysis platform to help accelerate pediatric cancer research.

The new solution aims to streamline the incorporation of human genomic data into clinical trial designs.

The $58 million financing round represents biopharma industry's growing interest in genomics data.