Merck vaccine plant taps R&D's Big Data platform to solve yield conundrum

Having invested in data analytics platforms in recent years, pharma R&D labs are working to develop the drugs that will justify their spending. But the experience of Merck ($MRK) suggests the costs can be spread across the business, with manufacturing teams finding the data analytics tools as useful as their peers in R&D do.

In an interview with InformationWeek, Merck's vice president of information technology George Llado explains how the company stumbled upon the overlap in requirements. Faced with unusually high discard rates, the vaccine production team began digging into data on site calibration settings, air pressure, temperature and other variables gathered by its systems, but only had a spreadsheet-based model. Aligning the data and spotting abnormalities took months. And the team maxed out its storage and memory limits after just two batches were added to the system.

The team saw the spreadsheet-based approach wasn't up to the task and began playing around with a massively scalable distributed relational database. At this point, someone suggested using the Hadoop system already in operation at Merck R&D. The Hortonworks Hadoop-based platform runs on Amazon Web Services and allows the R&D team to easily combine data from 16 disparate sources. As such, it could eliminate the vaccine unit's data transformation bottleneck. Within three months the team had conclusive answers to its high discard rate problem.  

"We took all of our data on one vaccine, whether from the labs or the process historians or the environmental systems, and just dropped it into a data lake," Llado said. Having aggregated and aligned the data using MapReduce and Hive, R-based analytics were used to create a heat map of every vaccine batch. This led to hypotheses and models which were tested against historical data. After three months, 15 billion calculations and 5.5 million batch-to-batch comparisons, Merck had an answer to its vaccine conundrum.

- read the InformationWeek article
- here's FiercePharmaManufacturing's take

Suggested Articles

The new solution aims to streamline the incorporation of human genomic data into clinical trial designs.

TriNetX's platform uses EHR data to help drug developers with clinical trial protocol design and study site and participant identification.

The $58 million financing round represents biopharma industry's growing interest in genomics data.