Repositive adds data sets, search capabilities to online genomic research platform

By Nick Paul Taylor Aug 31, 2016 7:21pm

Repositive has introduced the full version of its online genomics data platform. The latest version gives users access to 992,000 data sets, 23 times as many as were available previously, plus search functions designed to make it easier for researchers to identify the specific data they need.

The introduction of the full version of the platform continues a busy two years for Repositive, which introduced an early iteration of its system for beta testing in 2015 before going onto hook up with AstraZeneca ($AZN) in August of this year. In response to feedback gathered during beta testing, Repositive has made a raft of changes to its platform for its full launch. The core concept is still to give researchers a way to access human genomic data from a single location.

“Before the development of the Repositive platform it was extremely difficult to get broad visibility over what human genomic data exists internationally. On average, researchers tend to know and be familiar with around four to five sources of human genomic data,” Repositive CEO Fiona Nielsen told FierceBiotechIT. “This may work well when they are looking for genomic data in a particular area in which they are very familiar, but can prove a challenge when looking for new data.”

Repositive is seeking to address this issue by collating data on its platform. Users can now access 992,000 databases from 23 sources. The beta version featured 42,000 databases from 10 sources. In expanding the platform, Repositive has added data from the Exome Aggregation Consortium, Sequence Read Archive and database of Genotypes and Phenotypes. Repositive plans to add data from another 50 sources in the coming months.

That period will mark the escalation of Repositive’s plans to expand the breadth of data covered by its platform to include more esoteric and specialized data sets. This expansion into specialized data is set to get underway in September with the addition of Chinese Control Data, a repository of data from Asian projects such as GenomeAsia and the Singapore Genome Variation Project.

“The first collection originated from interaction with users at a workshop in Hong Kong where we identified a need for research data from ethnic Chinese individuals,” Nielsen said. “Since then, we have been listening to our users talk about what data has been difficult to come by, and we were able to identify a number of growing areas of importance where data is difficult to find or access.” Repositive plans to add data sets covering DNA methylation, RNA expression, the microbiome and other fields.

With users calling for enhancements to the platform’s search capabilities even prior to the ballooning of the number of data sets, Repositive has prioritized making it easier to sift through the data. The updated platform features Boolean search--the ability to use operators such as AND, OR and NOT--and improved curation of search filters.

“By using predicated and Boolean search, alongside the option to do keyword searches, users can search Repositive to find the specific information they are after regardless of where the data is hiding,” Nielsen said. “The keyword search allows users to search with a high degree of accuracy for very specific genomic data, both quickly and easily in all the data sources that Repositive is indexing.”

With these features now implemented, Repositive is pushing ahead with other improvements. Areas of near-term focus include the development of features aimed at commercial research organizations, which Repositive plans to offer on a subscription basis.

"These features will streamline data access workflows for research organisations of any size," Nielsen said. "At the same time, we are working with a number of organizations to enable enterprise solutions and customized data access solutions based on the Repositive platform, such as the PDX data consortium that we launched recently in collaboration with AstraZeneca."

genomic data AstraZeneca