Text Mining, Information Extraction,
and Assisted Curation for the Biomedical Literature

OntoGene is a research initiative which aims at pushing the boundaries of text mining for the biomedical literature. Our core competencies lie in Information Extraction, i.e. the extraction of domain-specific entities (such as genes, proteins, drugs, diseases), and their semantic relations, from the biomedical scientific literature. Our approach is based upon high-recall entity recognition, followed by relation extraction using a combination of shallow and deep methodologies (dependency parsing), combined with advanced machine learning techniques.

We consider community-run evaluation challenges as the best way to provide an independent and unbiased evaluation of text mining tools. We participated in the BioCreative challenges (since 2006), the BioNLP shared task (2009) and CALBC (2010). See below (Past Highlights) for details of our results.

Additionally, we provide an environment for Assisted Curation (ODIN), as an example of a real-world application of biomedical text mining. ODIN was initially developed as part of the SNF-funded project SASEBio, with also support from a pharmaceutical company. It was tested in the non-competitive interactive track of the BioCreative competitions, where it obtained favourable comments from the users. It was also tested in collaboration with the PharmGKB database at Stanford University [REF]. ODIN is currently being used in the curation pipeline of the RegulonDB database, in a project funded by the US National Institute of Health.

Our selected publications provide a good overview of the techniques used in the OntoGene system. We are based at the Institute of Computational Linguistics (Department of Computer Science) of the University of Zurich

NEWS (2016)

  • April 2016: Dr. Fabio Rinaldi nominated as Management Committee member for the COST action GREEKC [CA15205].
  • March 2016: Official start of the MelanoBase project. Dr. Rinaldi scientific chair of SMBM 2016.
  • February 2016: Three journals papers preliminarly accepted for publication.
  • January 2016: Dr. Fabio Rinaldi is visiting the RegulonDB group within the scope of an NIH-funded collaboration.

PAST HIGHLIGHTS

  • 2015
  • 2014
    • Funding: NIH project in collaboration with the RegulonDB group (UNAM, Mexico) approved! Pilot project on large-scale detection of protein-protein interaction from the literature funded by Roche.
    • Conferences: co-organizers of SMBM 2014.
    • Collaborations: Joint paper on BioC implementations in collaboration with the Wilbur group at the National Center for Biotechnology Information / National Library of Medicine (NLM/NCBI) on BioC.
  • 2013
    • Evaluation: OntoGene's participation in BioCreative 2013 was a resounding success. More information and papers here.
    • Conference: Dr. Fabio Rinaldi co-chair of LBM 2013, December 12-13, University of Tokyo, Japan.
  • 2012
    • Conferences: we organized the 5th International Symposium on Semantic Mining in Biomedicine (SMBM)
    • Evaluation: Best overall results in the 'triage' task of BioCreative 2012 (in particular due to very accurate entity recognition), as described in this paper
  • 2011
  • 2010
    • Evaluation: Good results in all of the task of the Biocreative III competition; Best results in the 'species' and 'diseases' categories of the CALBC competition. 
    • Funding: SASEBio project approved (SNF).
  • 2009
    • Evaluation: Our system had the best results for the detection of protein-protein interactions in the BioCreative II.5 competitive evaluation (2009) [REF].
  • 2006
    • Evaluation: In BioCreative II best results in the detection of mentions of experimental methods and third-best results in the detection of protein-protein interaction from the literature [REF].

    For more details, see past highlights.

    FUNDING

    • From 2016 till 2018 our main source of funding will be the project MelanoBase, recently approved by the Swiss National Science Foundation. Additional funding will be provided by the PsyMine project, and by industrial collaborations.
    • From Aug 2010 to Jul 2014 we were funded by the Swiss National Science Foundation (SNF) through the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature). Additional funding provided by Novartis and Roche.
    • Additionally in 2012-2013 a post-doc in our group (Gintarė Grigonytė) was funded by a Sciex fellowship (see BioTermEvo project).
    • In 2008/2009 we were funded by the SNF project Detection of Biological Interactions from Biomedical Literature (grant 100014-118396/1).