Google applies large-scale machine learning to drug discovery

Google's headquarters--courtesy of Google
Google's headquarters--Courtesy of Google

Google ($GOOG) has given the world a peek at one of the ways in which it thinks algorithms and huge datasets could reshape drug discovery. The work involves trying to make virtual drug screening more efficient using the same ethos Google applies to most problems: More data, more computing power.

A team at Google Research worked with a group from Stanford University on the project, which aimed to go beyond typical virtual drug screening models by pulling in data on multiple diseases. The desire to increase the breadth of data fed into the model was driven, in part, by the idea that machine learning is more effective when multiple problems are tackled at once. Such multi-task learning has shown promise in multiple fields but requires considerable computing power. Fortunately for Google, this is one area in which it is well equipped.

The search giant applied its large-scale neural network training system to the work. Google built the system to train networks of tens of thousands of CPU cores to perform a task. In the drug discovery project, the training entailed equipping the network to comb through 37.8 million data points covering more than 200 different biological processes. After running the system for more than 50 million CPU hours, Google has concluded the inclusion of data from multiple sources allows it to make more accurate predictions of the efficacy of a drug across different diseases.

Even greater scales are in Google's sights. At the time of writing a paper on the project, Google had scaled the system up to 239 tasks and the upward efficiency trend was yet to plateau. Similarly, the addition of more data was found to increase efficiency, too. The researchers have cast lustful looks at the "vast private stores of experimental measurements" locked away at Big Pharma companies as they try to figure out the next steps for the model. More data and more tasks are the near-term goals for the project.

Whether the efficiency gains touted by Google will have an effect on drug discovery remains to be seen. Even the paper's authors accept that the complexity of drug discovery could limit the impact of the approach, but overall are as optimistic as one would expect Google staffers to be about the potential for data and algorithms deliver improvements.

- read the paper (PDF)
- check out Google's blog post
- and VentureBeat's take

Suggested Articles

The new solution aims to streamline the incorporation of human genomic data into clinical trial designs.

The $58 million financing round represents biopharma industry's growing interest in genomics data.

Clinerion inks a new deal that adds 60 million U.S. patients to its clinical trial patient recruitment system.