The data-handling requirements of YouTube are well documented. Users upload 300 hours of video every minute. If the site keeps growing at its current rate, researchers think users could be uploading 1,700 hours of video a minute by 2025, a total that would amount to 2 exabytes of data per year. Yet if a new analysis is right, the computing crunch facing genomics could dwarf these challenges.
Researchers went over the numbers in a paper in PLOS Biology to compare the storage and analysis needs of four data-intensive operations: Astronomy, genomics, Twitter ($TWTR) and YouTube. The forecasts generated some big figures. If by 2025 researchers have sequenced 100 million human genomes, the field may require 2 exabytes of storage. But if genomics really takes off and 2 billion people are sequenced, the figure could swell to as much as 40 exabytes. When paired to the burden of acquiring, distributing and analyzing data, the study suggests genomics will need serious power.
"This serves as a clarion call that genomics is going to pose some severe challenges. Some major change is going to need to happen to handle the volume of data and speed of analysis that will be required," Gene Robinson, co-author of the paper and a biologist at the University of Illinois at Urbana-Champaign, told Nature News. As Robinson sees it, genomics will be one of, if not the, most demanding fields in terms of data, but it is failing to prepare for this burden as effectively as the likes of Facebook ($FB) and YouTube.
Some observers have questioned the rigor of the research. Narayan Desai, a computer scientist at tech company Ericsson, said it "isn't a particularly credible analysis," flagging up the way the paper downplays the processing and analysis demands of Twitter and YouTube's targeted advertising units as an area in which it fails to make a like-for-like comparison between the fields. Yet while such details are disputed, the overarching message of the paper is one with which many people can agree.
"The world has a limited capacity for data collection and analysis. Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can't easily address questions like this," Desai said. The situation in genomics, a field in which many researchers have the means to generate large amounts of data, contrasts to areas such as high-energy physics, the requirements of which necessitate a more centralized approach. In this regard, the democratization of genomics, one of its strengths, may also be one of its weaknesses.