Digitization work turns 125 years of disease reports into Big Data goldmine

Since 1888, the U.S. has published weekly reports on cases of notifiable diseases, building a huge trove of data in the process. The usefulness of this data was limited by its format, which stopped researchers from effectively mining it for insights, but a massive digitization project has now opened up the information.

A team from the University of Pittsburgh School of Public Health is behind the project, which has turned 88 million disease cases published in the Centers for Disease Control and Prevention's Morbidity and Mortality Weekly Report into an open-access database. Now, information that was previously siloed in individual weekly reports is searchable. "We've taken this entire corpus of the infectious disease history and made it so anybody on earth can look at it easily, conveniently and make sense out of it," University of Pittsburgh's Dr. Donald Burke said.

The team behind the digitization initiative, called Project Tycho, has used the database to show the huge effect the introduction of vaccines had on polio, measles, rubella and other now-preventable diseases. By making the Big Data available for everyone, the potential uses for the information go far beyond the ambitions of the University of Pittsburgh team, though. Epidemiologists could learn more about how epidemics arise, spread and interact, leading to the more effective use of medical countermeasures.

"Rather than have individual groups work on these data depending on restricted data access, that access can now be expanded to the whole world, really using more minds together to improve global health," Dr. Willem van Panhuis said. Having made the U.S. disease data more widely available, Panhuis and his colleagues write in The New England Journal of Medicine that they hope open access to such high-resolution spatiotemporal information will become the norm around the world.

- here's the NEJM paper (PDF)
- read MedCity News' take

Special Report: 10 Reasons Why Biotech Needs Big Data