Simultaneous analyses of text and image data are helping researchers probe the intersections between biology and chemistry--so says the Fraunhofer Institute in Germany. The institute is working with the Jülich Supercomputing Centre on automated annotation software for grid-connected supercomputers, which are being used to query some 50,000 pharmaceutical chemistry patents.
The partners have processed the patents on the large-scale computing grid infrastructures at the two institutions. Automated "named entity recognition" services have identified and annotated biological entities in text (e.g., protein names, gene names, cell types); medical entities in text (including disease names); chemical information in text, including drug names; and images, including chemical structure depictions.
UNICORE middleware helps manage the annotation services in the grid infrastructure; control the streams of input and output data from the patents database to the annotation services; and monitor the overall progress.
Text-mining applications have so far been run only on bibliographic databases of life sciences and biomedical information, according to an announcement. Simultaneous text/image analyses in full-text documents on grid infrastructures represent a next step in computing. "This goes way beyond the usual simulation applications" used in scientific computing, says Martin Hofmann-Apitius, department head for bioinformatics at Fraunhofer.
- here's the Fraunhofer announcement