ENCODE generates 15 trillion bytes of data for researchers

The ENCODE project--the initial results of which were released last week--was in some ways a computational biology phenomenon.

The project, which found that 80% of human DNA is active and necessary, will pay benefits to drug researchers for decades to come. According to The New York Times, the project to uncover the reason behind so-called "junk DNA" offered major advances in DNA sequencing and computational biology that made it conceivable to understand the dark matter of human DNA. In the process, researchers generated 15 terabytes (15 trillion bytes) of raw data and required the equivalent of more than 300 years of computer time.

By treating cells with a type of nuclease called DNase I and analyzing the patterns of snipped DNA sequences using massively parallel sequencing technology and superfast computers, the researchers were able to create comprehensive maps of all the regulatory DNA in hundreds of different cell and tissue types, according to University of Washington research. One scientist called the data the "Google map" of its predecessor project, the Human Genome Project, reports The New York Times. Where the first was like viewing the Earth from outer space, ENCODE puts the roads on the map.  

Those roads may lead to big discoveries for drug researchers, Reuters reports. For example, the project found that just 20 gene switches in the DNA may be related to 17 seemingly unrelated cancers. This provides a manageable array of targets.

NIH's National Advisory Council for Human Genome Research last year approved $14 million to be spent over four years on a data analysis and coordination center, which will include a centralized database for all ENCODE projects. Also, the advisory group consented to $9 million in funding over the next three years for computational analyses of genomic data, aiming to glean new biological insights from the genomic information.

And to put the icing on the IT cake, an iPad app is being made available by the journal Nature, which will make all of the ENCODE research available at no charge. Go to the journal's ENCODE site for more.

- read The New York Times story 
- get more from Reuters
- here's another take from the University of Washington

Special Report: ENCODE - 5 Useful Online Biology Research Tools

Suggested Articles

The new solution aims to streamline the incorporation of human genomic data into clinical trial designs.

TriNetX's platform uses EHR data to help drug developers with clinical trial protocol design and study site and participant identification.

The $58 million financing round represents biopharma industry's growing interest in genomics data.