J&J collaboration with San Diego Supercomputer Center speeds genome analysis

Rheumatoid arthritis has become a key therapeutic area for Johnson & Johnson ($JNJ), with Remicade and Simponi generating sales of $7.6 billion by treating the disease and other autoimmune disorders last year. To maintain its position, J&J subsidiary Janssen is working with the San Diego Supercomputer Center (SDSC) on whole genome sequence analysis.

Janssen turned to SDSC to help it better understand rheumatoid arthritis, as well as the genetic factors that dictate whether a patient responds to one of its biologics. The collaborators--alongside the third partner Scripps Translational Science Institute (STSI)--sequenced the genomes of 438 people with rheumatoid arthritis, but then faced the challenge of analyzing the data quickly. SDSC hit a bottleneck in the "sort" step of the read mapping stage.

While a relatively small number of computing cores were required, the need to quickly access several terabytes of data posed problems. Conventional hard drives were too slow, so SDSC used the supercomputer's flash memory. "Several terabytes of flash were aggregated into what we call "BigFlash" nodes, which significantly reduced the I/O (input/output) bottleneck in this step and contributed to helping researchers meet the project's timelines," SDSC's Wayne Pfeiffer said. 

Using the approach, SDSC completed the analysis within 6 weeks. The center views the results as evidence that while raw computing power is important in genomics, memory and I/O operations per second play a bigger role. With more organizations turning to supercomputers for analysis--the New York Genome Center and a Hong Kong hospital both added capabilities this week--there is a growing need for knowledge of how to get the most from the machines.

- read SCDC's press release