Text Mining, Information Extraction,     
and Assisted Curation in the biomedical domain

OntoGene/BioMeXT is a research initiative which aims at pushing the boundaries of text mining for the biomedical literature. Our core competencies lie in Information Extraction, i.e. the extraction of domain-specific entities (such as genes, proteins, drugs, diseases), and their semantic relations, from the biomedical scientific literature. Our approach is based upon high-recall entity recognition, followed by relation extraction using a combination of shallow and deep methodologies (dependency parsing), combined with advanced machine learning techniques.

We consider community-run evaluation challenges as the best way to provide an independent and unbiased evaluation of text mining tools. We participated in the BioCreative challenges (since 2006), the BioNLP shared task (2009) and CALBC (2010). See below (Past Highlights) for details of our results.

Additionally, we provide an environment for Assisted Curation (ODIN), as an example of a real-world application of biomedical text mining. ODIN was initially developed as part of the SNF-funded project SASEBio, with also support from a pharmaceutical company. It was tested in the non-competitive interactive track of the BioCreative competitions, where it obtained favourable comments from the users. It was also tested in collaboration with the PharmGKB database at Stanford University [REF]. ODIN is currently being used in the curation pipeline of the RegulonDB database, in a project funded by the US National Institute of Health.

Our selected publications provide a good overview of the techniques used in the OntoGene system. We are based at the Institute of Computational Linguistics (Department of Computer Science) of the University of Zurich. Since 2016 our group is a member of the Swiss Institute of Bioinformatics. Our new group name is BioMeXT.

NEWS (2017)

  • April: In the past two months we have participated in the TIPS shared task of the BioCreative/BECALM challenge, which evaluated web services for biomedical text mining. Our system had the best evaluation score for efficiency of annotation. Once again, OntoGene/BioMeXT confirms that it can deliver state-of-the-art and best-of-breed results.
  • March: Two new journal papers (Rinaldi et al, 2017; Balderas-Martinez et al, 2017) on assisted curation accepted for publication. The collaboration with RegulonDB is bearing fruits!
  • February:  Dr. Fabio Rinaldi is visiting the RegulonDB group within the scope of an NIH-funded collaboration.
  • January: 
    The project "Automated detection of adverse drug events from older inpatients' electronic medical records using structured data mining and natural language processing", submitted within the "Smarter Health Care" National Research Programme (NRP74) has been approved.  OntoGene/BioMeXT will participate in this project, contributing natural language processing technologies for the automated analysis of medical records.


  • 2016
    • Funding: official start of the MelanoBase project on March 1st. Collaboration with Roche on text mining for competitive intelligence. COST action GREEKC [CA15205].
    • Conferences: Dr. Rinaldi scientific chair of SMBM 2016.
    • Other: Dr. Fabio Rinaldi nominated as Swiss representative in the Management Committee of the COST action GREEKC [CA15205].
    • Publications: paper about the BEL Track at BioCreative 2016.
    • Collaborations: part of the MelanoBase project will be carried out at the HTL-NLP group of the Fondazione Bruno Kessler (FBK), Trento, Italy.
  • 2015
  • 2014
    • Funding: NIH project in collaboration with the RegulonDB group (UNAM, Mexico) approved! Pilot project on large-scale detection of protein-protein interaction from the literature funded by Roche.
    • Conferences: co-organizers of SMBM 2014.
    • Collaborations: Joint paper on BioC implementations in collaboration with the Wilbur group at the National Center for Biotechnology Information / National Library of Medicine (NLM/NCBI) on BioC.
  • 2013
    • Evaluation: OntoGene's participation in BioCreative 2013 was a resounding success. More information and papers here.
    • Conference: Dr. Fabio Rinaldi co-chair of LBM 2013, December 12-13, University of Tokyo, Japan.
  • 2012
    • Conferences: we organized the 5th International Symposium on Semantic Mining in Biomedicine (SMBM)
    • Evaluation: Best overall results in the 'triage' task of BioCreative 2012 (in particular due to very accurate entity recognition), as described in this paper
  • 2011
  • 2010
    • Evaluation: Good results in all of the task of the Biocreative III competition; Best results in the 'species' and 'diseases' categories of the CALBC competition. 
    • Funding: SASEBio project approved (SNF).
  • 2009
    • Evaluation: Our system had the best results for the detection of protein-protein interactions in the BioCreative II.5 competitive evaluation (2009) [REF].
  • 2006
    • Evaluation: In BioCreative II best results in the detection of mentions of experimental methods and third-best results in the detection of protein-protein interaction from the literature [REF].

    For more details, see past highlights.


    • From 2016 till 2018 our main source of funding will be the project MelanoBase, recently approved by the Swiss National Science Foundation. Additional funding will be provided by the PsyMine project, and by industrial collaborations.
    • From Aug 2010 to Jul 2014 we were funded by the Swiss National Science Foundation (SNF) through the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature). Additional funding provided by Novartis and Roche.
    • Additionally in 2012-2013 a post-doc in our group (Gintarė Grigonytė) was funded by a Sciex fellowship (see BioTermEvo project).
    • In 2008/2009 we were funded by the SNF project Detection of Biological Interactions from Biomedical Literature (grant 100014-118396/1).