The National Cancer Institute (NCI) faces a growing data problem. As the scale of its projects grows--one initiative is expected to generate 2.5 petabytes of data--downloading results for analysis on local systems becomes impractical. Recognizing this, the NCI has allocated $20 million for cloud-based genomics pilot projects.
Having spent last year gathering feedback on what users want from cloud-based cancer genomics, the NCI is now seeking organizations that can realize its ambitions. The plan is to sponsor development of multiple pilot projects that will undergo a competitive evaluation. After consulting with cancer researchers, the NCI will select one or more of the projects--or an evolution of them--as the production version. Ultimately, the project will feed into a Cancer Knowledge Commons that aggregates all data.
The end goal is to democratize access to NCI-generated data by lessening the technical requirements for researchers. In the current system, data is downloaded for analysis with locally developed and run tools. The reliance on local systems means organizations without top-tier computing power and capabilities can do less with the data. Similarly, storing petabytes of data is unrealistic for some teams. With data pools growing faster than hard drive prices are falling, storage costs will rise too.
The NCI views cloud-based genomics systems as the answer. Storing data and running analysis tools in the cloud will lessen the computing power needed for end users to look for insights in the results. Even elite research institutions should benefit from using a cloud-based system, as data download times are on the verge of becoming unworkable. Using a point-to-point 10-gigabit network, downloading the 2.5 petabytes generated by the Cancer Genome Atlas would take weeks.
- view the NCI synopsis