FDA tweaks openFDA to allow downloading of databases

FDA Chief Health Informatics Officer Dr. Taha Kass-Hout--Courtesy of the FDA

The FDA has acted on requests from users to allow data it makes available through openFDA to be downloaded. The U.S. regulator revised the system late last year to enable users to download the data for use offline.

Taha Kass-Hout, the FDA's chief health informatics officer, and his team originally designed openFDA to give users a way to question various data sets using a search API. The FDA still expects most users to go through the API when they want to analyze its repositories of information on drug adverse events, labeling and medical devices. But, in some cases, users may extract additional insights from the data by downloading them for offline analysis. Having fielded requests for such a capability since the early days of openFDA in 2014, the regulator has now added an option to download data.

The FDA sees the ability to download data being useful in some specific situations. In particular, the cap built into the API that limits searches to 5,000 records per query, which FDA put in place to manage the load on its system, can constrain some users. These users and others who need access to the full data sets can now download them. The data will be familiar to existing users. "The data available for download are the same data sets previously available at openFDA. The JSON schema used for the downloads is exactly the same as the output you currently get from the API," the FDA wrote.

The entire compressed data set clocks in at 23 GB. Once uncompressed, users can expect the data to take up around 100 GB of space. The size of the data set is one reason online access through the API is likely to remain the preferred route for most users. The FDA updates the databases frequently to keep them up to date, meaning offline users will need to redownload data if their analyses are to remain current. And, because updates could affect every record, the FDA has warned users it is impossible to just download the new files. The entire endpoint of interest must be downloaded each time.

- read the post