Robert Hoffmann and Alfonso Valencia of the Spanish National Centre of Biotechnology (CNB/CSIC) in Madrid developed the web-based tool, called iHOP (Information Hyperlinked over Proteins). It converts the 14 million abstracts in the PubMed bibliographic database into a network of interlinked references to genes, proteins, mutations, diseases and (bio)chemical compounds.
"By using genes and proteins as hyperlinks between sentences and articles, iHOP makes the information stored in PubMed accessible as one navigable resource," they said.
The key features of iHOP are the organisation of textual and genomic information in a relational database and the use of text-mining technology for the detection of biomedical entities in natural text. Production of state data is based entirely on XML coding and avoidance of complex front-end database queries means response times are extremely fast, according to the researchers.
"While conventional keyword searches result in long and not always informative lists of abstracts, navigation along this gene-guided network allows for a stepwise and controlled exploration of the information space," said Hoffmann and Valencia.
Moreover, the iHOP system shows that distant medical and biological concepts can be related by surprisingly few intermediate genes; the shortest path between any two genes involving on average only four steps, they note.
Hoffmann and Valencia expect this highly connected network to trigger a revolution in new text-mining tools that will bring biomedicine within closer reach of both the scientific community and the wider public.