Big Data has arrived in biotech. Now what?

AstraZeneca R&D Information Vice President John Reynders, right, speaks to FierceBiotech Executive Editor Ryan McBride, left, and NEA General Partner Dave Mott.

The world's biggest biotechs are piling up data by the petabyte, amassing numbers for gene expression, biomarker reliability, patient outcomes and pretty much every other measurable vector in drug development. Of course, to date, no disease has ever been treated by a spreadsheet, so how do we take Big Data from the server closet to the doctor's office?

FierceBiotech Executive Editor Ryan McBride asked a panel of drug-development luminaries just that during the J.P. Morgan Healthcare Conference on Wednesday, bringing together GlaxoSmithKline ($GSK) Senior Vice President Lon Cardon, Warp Drive Bio CEO Alexis Borisy, AstraZeneca ($AZN) R&D Information Vice President John Reynders and New Enterprise Associates General Partner Dave Mott.

Big Data has quickly gone from a buzz phrase to a punch line in some circles, and while it's not a panacea for pharma's woebegone batting average over the past few years, Reynders said, it can still be an invaluable tool--provided you know how to use it.

"Big Data is only going to be as good as the questions that are being asked of it," he said. "It's the human element in the loop that's able to interrogate that data." And biotech is going to have to develop its own model, Reynders said. Companies such as Google ($GOOG) and Amazon ($AMZN) have already capitalized on the deluge of information now available, but hypothesis-generation and prediction are far easier when you're just looking at what books and websites people like, he said. Gene expression is a mite more complicated.

Hypotheses can quickly spiral out of control in the face of so much information, Borisy said, and the challenge is to have questions well-defined enough to make the available knowledge useful.

Warp Drive Bio CEO Alexis Borisy, left, and GlaxoSmithKline Senior Vice President Lon Cardon discuss how Big Data can be used to generate hypotheses in drug discovery.

For instance, Phase II, that ever-vexing rubber-road matchmaker for promising compounds that just might be worthless. Identifying the right patients for the right drug can make or break a Phase II trial, Reynders said, and Big Data can come in handy as investigators distill mountains of imaging results, disease progression readings and genotypic traits to find their target participants.

The prospect of widespread genetic mapping coupled with the power of Big Data could fundamentally change how biotech does R&D, Borisy said. "Imagine having 1 million cancer patients profiled with data sets available and accessible," he said. "Think how that very large data set might work--imagine its impact on what development looks like. You just look at the database and immediately enroll a trial of ideal patients."

And the value of high-volume data goes far beyond the development phase, Mott said. Once a therapy is commercialized, crunching the numbers on patient outcomes and follow-up studies can help a drugmaker expand its reach and nab more indications.

"In this day and age, I'm not sure we're ever going to have another Lipitor without Big Data, because the requirements for signal finding on the safety side are so high right now," Mott said. "You're not going to have a drug being used by that many people without Big Data."

The full panel, from left: Borisy, Cardon, McBride, Mott and Reynders.

But not all data is created equal. Big minds and big money go into genomics that guess toward the future, but the phenomic aspect--data often scribbled in hand-written notes by physicians around the globe--is unstructured and almost impossible to index, Borisy said. It's a paradox, according to Cardon: "The predictors are exquisitely refined, but the things we're predicting are dirty and muddy and error-prone."

Or, as Borisy put it, "Our clinical, phenomic data sucks."

Thus, the challenge for biotechs looking to tap Big Data: The industry needs to take on the burden of extracting those muddled, unstructured data sets and merging them with the sophisticated parameters already available, according to Borisy. The resulting wall-to-wall databases could solve a whole host of problems in drug development, and, once we've made the leap, "we're going to be cooking," Mott said.

So when can we expect this? Mott says 20 years; Borisy says it's more like 5; and Cardon says we're most of the way there and just need the right approach to noisy data sets. Everyone, however, agreed that all the data in the world is for naught if you don't know what to do with it, and the path to getting the most out of Big Data involves stitching together the disparate sets into an integrated whole--and burning through the time and money required to make that happen.

In short, we have the numbers; we just need to make them work. -- Damian Garde (Twitter | email)