AstraZeneca sets out to build 5-petabyte, 2M-genome database

Big Data Computer Warehouse

AstraZeneca ($AZN) is redefining what big looks like in genome research. While a range of initiatives have been branded population-scale sequencing programs, none wear the label quite as comfortably as AstraZeneca’s scheme. At 2 million whole genomes, the database will house more people than the population of the 50 smallest countries.

If the database were a country, it would be the 144th most populous on earth, wedged just between Macedonia and Latvia. And, with AstraZeneca performing whole genome sequencing on everyone in the database, it will have a sizeable digital footprint. AstraZeneca expects to gather 5 petabytes of data in total.

To put that figure in context, it is 25% of the capacity of all the hard drives produced in 1995. Or, as AstraZeneca EVP Mene Pangalos, put it: “If you put 5 petabytes on DVDs, it would be four times the height of [310-metre London skyscraper] the Shard.” To generate all that data, AstraZeneca is putting “hundreds of millions of dollars” into genome research, Pangalos said at a press conference attended by Nature News.

Virtual Event

Virtual Clinical Trials Online

This virtual event will bring together industry experts to discuss the increasing pace of pharmaceutical innovation, the need to maintain data quality and integrity as new technologies are implemented and understand regulatory challenges to ensure compliance.

Some of the cash will land in the bank account of Human Longevity, Inc, the J. Craig Venter-founded sequencing shop that is one of a limited number of organizations with the capacity to carry out such a project. AstraZeneca plans to send 500,000 samples to Human Longevity for sampling over the course of the collaboration, giving Venter’s suite of Illumina ($ILMN) HiSeq X sequencers plenty of material to process. The Big Pharma will also gain access to Human Longevity’s fast-growing database.

AstraZeneca is betting that data generated through these initiatives will help it identify rare genetic variants, something that should become easier to do as the size of the genome repository increases. The expectation is that the identification of the variants and unearthing of other insights will improve multiple aspects of AstraZeneca’s R&D operation, from the discovery of novel targets to the selection of participants for clinical trials.

- read Nature Newsarticle
- and FierceBiotech’s take

Read more on

Suggested Articles

A new and unproved COVID-19 cell therapy from a New Jersey biotech has been given a quick trial approval by the FDA.

The VC shop raised the money across two funds that will deploy cash to help biotechs get started and grow into more established players.

Gilead Sciences' Kite Pharma enlisted Teneobio’s antibody technology in its work on new CAR-T therapies for patients with multiple myeloma.