OntoGene: Text Mining for the Biomedical Literature 

OntoGene is a research project based at the Institute of Computational Linguistics (Department of Computer Science) of the University of Zurich. Please check our publications.

Our work focuses on the extraction of semantic relations between specific biological entities (such as genes, proteins, drugs, diseases) from the biomedical scientific literature (e.g. PubMed). Our approach is based upon high-precision robust syntactic parsing of the target documents, combined with advanced machine learning techniques. The selected publications (below in this page) provide a good overview of the techniques used in the OntoGene system.

We consider community-run evaluation challenges as the best way to provide an independent and unbiased evaluation of text mining tools. We participated in the BioCreative challenges (since 2006), the BioNLP shared task (2009) and CALBC (2010). In BioCreative II  (2006) we obtained very good results in the tasks of detecting protein interactions, and the best results in detecting experimental methods [2]. In BioCreative II.5 (2009) we obtained the best results in the PPI task (extraction of protein-protein interactions) [4]. In BioCreative III (2010) we were among the three best ranked teams in the PPI-ACT task (binary decision whether a paper contains a curatable protein-protein interaction) and PPI-IMT task (detection of experimental methods). In the latter task, we were the only group which managed to return results with full recall (while maintaining near-best AUC score). Our ODIN tool received favourable comments from curators in the IAT (interactive) task. In the "triage task" of BioCreative 2012 we obtained once again the best overall results.


  • Feb 2014: Dr. Fabio Rinaldi gives a set of invited talks at:
  • Jan 2014: OntoGene enters a new strategic partnership with the Data Science group at Hoffmann-La Roche.
  • Oct 2013: OntoGene's participation in BioCreative 2013 was a resounding success. We were involved in 3 tasks (in task 5 with two slightly different versions of ODIN), obtaining top ranked results in the CTD tasks and very good comments on the BioC and IAT tasks (which did not have a quantitative evaluation). We had a total of 4 papers, 3 presentations and one demo (we are probably the most active participant in BioCreative)! More information and papers here.
  • Sep 2013: As part of our 2013 BioCreative participation we provide:
    • PyBioC: a native python implementation of the BioC core
    • ODIN-CTD: a version of ODIN (our curation interface) customized for CTD
  • Aug 2013: OntoGene achieves again top-ranked results in a BioCreative task! (task 3 of BioCreative 2013). The CTD web services are available at:
  • Jul 2013: Dr. Fabio Rinaldi invited to co-chair LBM 2013, December 12-13, University of Tokyo.
  • Feb 2013: The paper describing our participation in the 'Triage' task of BioCreative 2012 has been published (we obtained the best overall results).
  • Sep 2012: The 5th International Symposium on Semantic Mining in Biomedicine (SMBM), which we organized at our premises, was a very successful event. 
  • Jul 2012: Two additional journal papers accepted for publication (see [7] and [8] below). OntoGene continues a very successful year!
  • Apr 2012: The long awaited paper which describes the text mining technologies used for our pharmacogenomics experiments has finally been accepted for publication [6]:   doi:10.1016/j.jbi.2012.04.014.
  • Mar 2012: Our work on assisted curation, in collaboration with the PharmGKB group at Stanford University has now reached the stage of journal publication [5]:  doi:10.1093/database/bas021.


  • Our participation in the "triage task" of BioCreative 2012 was rewarded with the best overall results, in particular due to very accurate entity recognition. 
  • We are currently funded by the Swiss National Science Foundation through the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature). Duration: 3 years. Funding: 3 post-docs. Additional financial and practical support provided by Novartis Pharma AG, NIBR-IT, Text Mining Services, CH-4002, Basel, Switzerland. 
  • Additionally a post-doc in our group (Gintarė Grigonytė) is funded by a Sciex fellowship (see BioTermEvo project).
  • In the BioCreative III shared task (2010) we were one of only two groups which participated in all of the tasks, obtaining satisfactory results in all of them. We were involved as co-authors in four journal papers in the special issue of BMC Bioinformatics on BioCreative III.
  • In the CALBC competition (2009/2010), which targets large-scale entity detection across the biomedical literature, our system achieved best results in the 'species' and 'diseases' categories.
  • In 2008/2009 we have been funded by the  SNF project Detection of Biological Interactions from Biomedical Literature (grant 100014-118396/1).  
  • Our participation in the BioCreative II.5 competitive evaluation (2009) of biomedical text mining systems resulted in the best run for the detection of protein-protein interactions (according to the 'raw' AUC iP/R metric). Our system was overall considered as one of the best three.

Selected publications

Important note: this is just a list of a few selected publications chosen to provide a descriptive overview of the activities of the OntoGene research group. The full list of publications contains more than 60 peer-reviewed publications (including 17 journal papers). If needed, contact us for free preprints.

  1. Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael Hess, Martin Romacker. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics 2006, 7(Suppl 3):S3. doi:10.1186/1471-2105-7-S3-S3
  2. Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13.
  3. Gerold Schneider, Kaarel Kaljurand, Thomas Kappeler, Fabio Rinaldi. Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. CICLING 2009. (Best paper award!)
  4. Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide, Thérèse Vachon, Martin Romacker, OntoGene in BioCreative II.5, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3), pp. 472-480, 2010. <http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.50>
  5. Fabio Rinaldi, Simon Clematide, Yael Garten, Michelle Whirl-Carrillo, Li Gong, Joan M. Hebert, Katrin Sangkuhl, Caroline F. Thorn, Teri E. Klein, and Russ B. Altman. Using ODIN for a PharmGKB revalidation experiment. The Journal of Biological Databases and Curation, Oxford Journals, 2012, bas021; doi:10.1093/database/bas021
  6. Fabio Rinaldi, Gerold Schneider, Simon Clematide. Relation Mining Experiments in the Pharmacogenomics Domain. Journal of Biomedical Informatics (Elsevier),  Volume 45, Issue 5, October 2012, pages 851-861, 2012. doi:10.1016/j.jbi.2012.04.014
  7. Simon Clematide, Fabio Rinaldi. Ranking relations betweeen diseases, drugs, and genes for a curation task. Journal of Biomedical Semantics (BMC), 3(Suppl 3):S5, 2012. doi:10.1186/2041-1480-3-S3-S5
  8. Fabio Rinaldi, Simon Clematide, Simon Hafner, Gerold Schneider, Gintare Grigonyte, Martin Romacker, Therese Vachon. Using the OntoGene pipeline for the triage task of BioCreative 2012, The Journal of Biological Databases and CurationDatabase 2013: bas053, Oxford Journals, 2013. doi:10.1093/database/bas053


OntoGene is a non-commercial research project. We have nothing to do with a commercial product of the same name. We are also not related to the so-called "Ontogene network".