OntoGene: Text Mining for Biomedical Literature
Our work focuses on the extraction of semantic relations between
specific biological entities (such as Genes and
Proteins) from the biomedical scientific literature (e.g. PubMed). Our approach is based upon
high-precision
robust syntactic parsing of the target documents. A good documentation of our first application can be found in the paper [1] below.
We consider community-run evaluation challenges as the best way to provide an independent and unbiased evaluation of text mining tools. We participated in the BioCreative challenges (since 2006), the BioNLP shared task (2009) and CALBC (2010). In BioCreative II (2006) we obtained very good results in the tasks of detecting protein interactions, and the best results in detecting experimental methods [2]. In BioCreative II.5 (2009) we obtained the best results in the PPI task (extraction of protein-protein interactions) [4]. In BioCreative III (2010) we were among the three best ranked teams in the PPI-ACT task (binary decision whether a paper contains a curatable protein-protein interaction) and PPI-IMT task (detection of experimental methods). In the latter task, we were the only group which managed to return results with full recall (while mantaining near-best AUC score). Our ODIN tool received favourable comments from curators in the IAT (interactive) task.
October 2011: The special issue of BMC Bioinformatics on BioCreative III has been published: we are involved in four papers!
August 2011: The BioTermEvo project has been approved! Gintarė Grigonytė will join the OntoGene group next year, supported by a Sciex fellowship (under the supervision of Prof. Martin Volk). Congratulations!
March 2011: Six journal papers accepted for publication (see papers 42-47 in our list of publications). Not bad, considering that the SASEBio project started only 6 months ago!
September 2010: During the summer we participated in the BioCreative III competitive evaluation of text mining systems. We were one of only two groups which participated in all of the tasks, obtaining satisfactory results in all of them.
August 2010: SASEBio is officially started.
April 2010: The Swiss National Science Foundation (SNF) has approved the project SASEBio (Semi-Automated Semantic Enrichment of the Biomedical Literature). Duration: 3 years. Funding: 3 post-docs. Additional financial and practical
support provided by Novartis Pharma AG, NITAS, Text Mining Services,
CH-4002, Basel, Switzerland.
January 2010: We have successfully completed the SNF - funded project Detection of Biological Interactions from Biomedical Literature (grant 100014-118396/1).
October 2009: Our participation to the BioCreative II.5 competitive evaluation of biomedical text mining systems resulted in the best run for the detection of protein-protein interactions (according to the 'raw' AUC iP/R metric). Our system was overall considered as one of the best three (look for system T37 in the graph above).
February 2009 : One of our papers at CICLING as been selected for a Best Paper Award! See paper (3) in the list of selected papers below.
Selected publications
- Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael Hess, Martin Romacker. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics 2006, 7(Suppl 3):S3. doi:10.1186/1471-2105-7-S3-S3
- Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13.
- Gerold Schneider, Kaarel Kaljurand, Thomas Kappeler, Fabio Rinaldi.
Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. CICLING 2009. - Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide, Thérèse Vachon, Martin Romacker, "OntoGene in BioCreative II.5," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3), pp. 472-480, 2010. <http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.50>
Notes
OntoGene is a non-commercial research project. We have nothing to do with a commercial product of the same name. We neither endorse nor recommend such a product.
