Harvard plays money ball with biotech--and baseball--data

The Billy Beane of biotech might come in the form of a slate of statistical tools from Harvard University and the Broad Institute. Researchers from the venerable institutions have created the tools, dubbed MINE, that can tease out hidden patterns in large data sets from the worlds of biology, finance, social media and, yes, even baseball.

"There are massive data sets that we want to explore, and within them, there may be many relationships that we want to understand," stated Broad Institute associate member Pardis Sabeti, a lead contributor to the MINE effort. "The human eye is the best way to find these relationships, but these data sets are so vast that we can't do that. This toolkit gives us a way of mining the data to look for relationships."

With MINE (Maximal Information-based Nonparametric Exploration), researchers can plug in parameters of interest and mine for multiple patterns and rank them, according to Harvard's press release. When helping a Harvard scientist understand the trillions of microorganisms in the gut, MINE was able to find hundreds of previously unobserved patterns from his data. The researchers also hunted down patterns associated with salary from 2008 Major League Baseball data. That part might be of interest to Beane, whose "Moneyball" statistics helped the Oakland Athletics evaluate baseball talent.

A major theme in biotech research nowadays is handling the data overload. The MINE tools are among a plethora of new approaches that groups have developed to let machines help scientists evaluate huge data sets from experiments, with the intent of speeding the pace of progress in research and paving the way to breakthroughs for human health. Groups that are taking on the "Big Data" challenge in healthcare include Cambridge, MA-based GNS Healthcare, IBM ($IBM) and scores of others.

- here's the release