Detection of Biological Interactions from Biomedical Literature  (SNF  100014-118396/1)

mOG is the abbreviation that we use internally to refer to the SNF-funded project "Detection of Biological Interactions from Biomedical Literature" (grant 100014-118396/1), funded by the Swiss National Science Foundation (SNF). The project started on 01.04.2008 and will last 18 months. Additional financial and practical support is provided by Novartis Pharma AG,NITAS, Text Mining Services, CH-4002, Basel, Switzerland.



Lay Summary

We are living in an age where unprecedented amounts of information are available to almost everyone. However, the task of finding and absorbing information relevant to any particular question has become increasingly difficult. While a general solution to this problem is probably still remote, in specific domains it is becoming possible to deal with it, using novel techniques from the scientific fields of computational linguistics and text mining.

In the domain of biomedicine, for instance, which we have chosen for our project, research scientists and companies are increasingly faced with the problem of efficiently locating, in the vast amount of published scientific results, the critical pieces of information that are needed in order to assess current and future research investment. The largest repository of biomedical literature (Medline) grows by approximately 40'000 articles per month.

Our choice of this specific domain is also motivated by the existence of resources such as terminological databases and other knowledge repositories, which can support the process of literature-based discovery. This goal is currently attracting a significant amount of research and public funding. While many research projects concentrate on detecting domain entities (proteins, genes, diseases, etc.), possibly correlating them using statistical techniques, our focus is going to be on detecting domain relations, such as protein-protein interactions or disease-gene correlations, using linguistic techniques.

Systems currently used by biologists to search the literature are based on traditional information retrieval techniques, thus they typically deliver ranked lists of documents. Our approach aims at pinpointing just the words conveying the relevant information. Our method is based on a deep linguistic analysis of the literature using terminological resources (provided by Novartis) and a full syntactic analyzer (developed at the University of Zurich).

The project will produce a system capable of detecting domain interactions with high reliability. The results will be validated by participation in an international competitive evaluation of text mining tools for the biomedical literature. Additionally, we hope that the resulting tools will be useful in supporting the progress of medical research, by simplifying access to the vast amount of knowledge stored in the scientific literature. 

Notes 

mOG is part of OntoGene, an internally-funded project which aims at advancing the state-of-the-art in Text Mining for Biomedical Literature.