St. Jude, Microsoft and DNAnexus launch pediatric cancer genomics platform for researchers

St. Jude, Microsoft and DNAnexus have created a data-sharing and analysis platform to help accelerate pediatric cancer research. (St. Jude Children's Research Hospital)

After scientists at the St. Jude Children’s Research Hospital started making anonymized pediatric cancer patient data freely available to the public in 2010, they soon realized that the volume of the data is simply too large for easy access. So they went to explore technical solutions and began working with Microsoft and DNAnexus. Now, a cloud-based platform created by the partnership is up and running.

Meet St. Jude Cloud, which the collaborators said is the world’s largest public repository of pediatric cancer genomics data. For now, it stores over 5,000 whole-genome, 5,000 whole-exome and 1,200 RNA-Seq datasets generated from three St. Jude-supported genomics initiatives.

It’s also more than just a data storehouse. The platform provides a suite of analysis tools and visualization capabilities that aim to help researchers develop new treatments for pediatric diseases.

Free Webinar

From Patient Adherence to Manufacturing Ease - Why Softgels Make Sense for Rx

Join Thermo Fisher Scientific’s upcoming webinar to learn why softgels offer numerous benefits for Rx drug development, including enhanced bioavailability, patient compliance and easy scale-up. Register Today.

“St. Jude Cloud is a powerful resource to drive global research and discovery forward,” said Jinghui Zhang, Ph.D., chair of the St. Jude Department of Computational Biology and co-leader of the St. Jude Cloud project. “Providing genomic sequencing data to the global research community and making complex computational analysis pipelines easily accessible will lead to progress in eradicating childhood cancer.”

Zhang and her team at St. Jude worked with Microsoft and DNAnexus to develop a genome alignment and variant calling pipeline—an analytical technique that can identify where genomes differ—which is the key component to the Microsoft Genomics service the tech giant recently launched for genomics research. Data analyzed through the pipeline also became the foundation for St. Jude Cloud.

All the data on St. Jude Cloud lives on Microsoft Azure, which can handle large-scale datasets as populational genomics information. On top of that, DNAnexus builds the interface—a secure online ecosystem where researchers can access the data and tools.

RELATED: DNAnexus’ new solution integrates next-generation sequencing data in clinical trials

Besides three basic ways to inspect existing data—by disease, publication and curated dataset—the platform also allows more advanced sample collection, such as by gene mutation or expression level. Researchers can also upload and run their own data using the bioinformatics tools.

Because the data and analysis run in the cloud are powered by rapid computing capabilities that don’t require downloading, researchers can move their projects much faster. The hospital said a St. Jude scientist was able to replicate within a few days experimental findings from a B-cell leukemia study that had originally taken the team more than two years to make.

The rationale is that the more genomic data researchers can access and compare, the more accurately they can rule out the biological noise and pinpoint the real genetic factors behind tough diseases like cancer. By 2019, St. Jude expects to have 10,000 whole-genome sequences on St. Jude Cloud.

Suggested Articles

A COVID-19 antibody diagnostic developed through a joint venture between Mount Sinai Health System and RenalytixAI has been authorized by the FDA.

Researchers at Northwestern University have trained an AI algorithm to automatically detect the signs of COVID-19 on a basic X-ray of the lungs.

Polyphor is developing an inhaled version of murepavadin, which targets Pseudomonas aeruginosa infections, but is currently given intravenously.