FDA challenges industry to improve reproducibility and accuracy of informatics pipelines

FDA Chief Health Informatics Officer Dr. Taha Kass-Hout--Courtesy of the FDA

The FDA has set up the first challenge to use its cloud-based NGS data platform, precisionFDA. Officials at FDA have tasked participants with applying their informatics mapping and variant calling pipelines to well-characterized whole-genome sequencing read data sets, with a view to assessing performance in terms of accuracy and reproducibility.

Specifically, FDA is providing participants with FASTQ files generated from the same sample, using the same type of sequencer--an Illumina ($ILMN) HiSeq X Ten system--at two different laboratories, namely J. Craig Venter's Human Longevity and the Garvan Institute of Medical Research. FDA wants participants to run the FASTQ files through their pipelines, either by downloading the data or using the precisionFDA platform. This process will yield VCF files, the accuracy and reproducibility of which will form the backbone of FDA's assessment of the pipelines.

By making participants run the Garvan FASTQ through their pipelines twice and comparing the two resulting VCF files to each other, the FDA is hoping to gain an insight into the reproducibility of each of the informatics methods using in the competition. Then, by comparing one of the Garvan VCFs to the file generated from the HLI FASTQ, the FDA wants participants to assess the consistency of results from two facilities, both of which used HiSeq X Tens but potentially under different conditions. Each file will then be compared to the VCF benchmark from Genome in a Bottle to assess their accuracy. 

When the competition closes on April 25, the FDA will judge participants in 8 categories, which will assess the performance of each pipeline against criteria including the number of nonreproducible variants, precision and recall. The focus on reproducibility and accuracy fits in with the thinking behind precisionFDA, which was set up, in part, help the regulator evaluate NGS-based diagnostics, a task that requires an understanding of whether the output of an informatics pipeline is reliable.

- read the White House statement
- here's the FDA challenge page