Google adds Tute database to burgeoning Genomics platform

By Nick Paul Taylor Mar 15, 2015 5:05am

Google ($GOOG) has added another publicly available database to its growing Genomics platform. The latest deal sees Google Genomics host Tute Genomics' repository of 8.5 billion annotations of genetic variants, giving users another resource to probe with the tech giant's growing arsenal of data integration tools.

Adding the Tute database to a platform that already hosts and publicly shares data from the 1000 Genomes Project, Illumina Platinum Genomes, the MSSNG Database for Autism Researchers and other sources is part of how Google wants to differentiate its Genomics service. Data storage is at the heart of the offering. But by sharing databases and tapping into its search and computing know-how, Google is trying to become more than just a cheap place to dump data. Google has adapted its BigQuery analytics tool to genomics to further this ambition.

Jonathan Bingham

"It turns out that if you feed into [BigQuery] genetic variant calls from a cohort of patients, you can do queries against that. And in a matter of seconds, you can ask questions about allelic frequency, genome-wide association, linkage to phenotypic traits or drug treatments in a way that's just kind of mind-blowingly fast," Google Genomics Product Manager Jonathan Bingham told Bio-IT World. Running 88 GB of genetic variants against Tute's resource takes 30 seconds and the cost clocks in at under $1.

The ability to perform such tasks quickly and affordably is important to Tute. "We're working on the whole annotation layer. What's behind your variants? How we can intersect that against everything that's known?" Tute CSO David Mittelman said. As Tute began to work through these tasks and try to get to most from its database of gene model annotations and conservation scores, it had a revelation. "It screams 'search engine!' It's a search engine problem," Mittelman said. Now, it is working with the company that solved the web's search problem to improve how it interacts with the data.

- check out Bio-IT's article
- read the release