Text Mining, Information Extraction,
and Assisted Curation for the Biomedical Literature

PAST HIGHLIGHTS
  • 2015 NIH, PsyMine, MelanoBase, BEL task
    • October 2015: Official start of the PsyMine project, funded by the COGITO foundation.  In the PsyMine-project we will develop text mining technologies capable of extracting from the scientific literature relationships between mental diseases and their potential causes. The application’s main output will be a database of cause-disorder-relationships for mental diseases detected in the scientific literature across disciplinary boundaries.
    • September 2015: Another successful project proposal: the MelanoBase project has been approved by the Swiss National Science Foundation. The MelanoBase project aims at large-scale automatic extraction of actionable information from the biomedical literature and its integration with existing structured knowledge (life science databases). The specific use case of clinical research on Melanoma has been selected to validate the results of the project.
    • August 2015. We successfully organized a novel BioCreative task (Extraction of causal network information in Biological Expression Language - BEL), in collaboration with PMI R&D and Fraunhofer SCAI. The goal of the challenge was to verify the capability of text mining systems to reconstruct fragments of biological networks (pathways).
    • July 2015: we are moving into PubMed-scale processing. We recently parsed the entire PubMed in search of protein-protein interactions.
    • May 2015: We have revised our OntoGene web services, as described in the following paper:
    • January 2015. We have started our collaboration with the RegulonDB group (UNAM, Mexico) within the scope of a new NIH-funded project!!!
      • The aim of this project is to implement a novel way of processing and accessing the vast detailed knowledge contained within collections of scientific publications on the regulation of transcription initiation in bacterial models.
      • Funding ($1.6 Million) by the National Institute of Health (US).
      • Our role will be delivering text mining services and assisted curation tools.
  • 2014 Roche, RegulonDB, BioC, SMBM 2014
    • OntoGene enters a new strategic partnership with the Data Science group at Hoffmann-La Roche.
    • Dr. Fabio Rinaldi gives a set of invited talks at:
    • Dr. Fabio Rinaldi invited to co-chair SMBM 2014, October 6-7, University of Aveiro, Portugal.
    • Dr. Fabio Rinaldi gives a highlight presentation at ECCB 2014.
    • Collaboration with the RegulonDB group, UNAM, Mexico:
      • presentation about assisted curation with ODIN at BioCuration 2014
      • joint paper published in DATABASE: The Journal of Biological Databases and Curation
      • Joint project proposal accepted for funding by the National Institute of Health (NIH), US
    • Collaboration with  the Wilbur group at the National Center for Biotechnology Information / National Library of Medicine (NLM/NCBI) on BioC.
      • A joint paper on BioC implementations has been published.
      • BioC is a new standard for text representation in biomedical text mining
  • 2013 BioCreative 2013, LBM 2013
    • OntoGene's participation in BioCreative 2013 was a resounding success. We were involved in 3 tasks (in task 5 with two slightly different versions of ODIN), obtaining top ranked results in the CTD tasks and very good comments on the BioC and IAT tasks (which did not have a quantitative evaluation). We had a total of 4 papers, 3 presentations and one demo (we are probably the most active participant in BioCreative)! More information and papers here. As part of our 2013 BioCreative participation we provide:
    • Dr. Fabio Rinaldi invited to co-chair LBM 2013, December 12-13, University of Tokyo, Japan.
  • 2012 BioCreative 2012, SMBM 2012
    • We organized the 5th International Symposium on Semantic Mining in Biomedicine (SMBM)
    • We obtained the best overall results in the 'Triage' task of BioCreative 2012 (in particular due to very accurate entity recognition), as described in this paper
  • 2011 Text Mining for Pharmacogenomics
  • 2010 CALBC, BioCreative 2010
    • In the BioCreative III shared task we were one of only two groups which participated in all of the tasks, obtaining satisfactory results in all of them. We were among the three best ranked teams in the PPI-ACT task (binary decision whether a paper contains a relevant protein-protein interaction) and PPI-IMT task (detection of experimental methods). In the latter task, we were the only group which managed to return results with full recall (while maintaining near-best AUC score). We were involved as co-authors in four journal papers in the special issue of BMC Bioinformatics on BioCreative III.
    • In the CALBC competition (2009/2010), which targets large-scale entity detection across the biomedical literature, our system achieved best results in the 'species' and 'diseases' categories.
  • 2009 BioCreative 2009
    • Our participation in the BioCreative II.5 competitive evaluation (2009) of biomedical text mining systems resulted in the best run for the detection of protein-protein interactions (according to the 'raw' AUC iP/R metric). Our system was overall considered as one of the best three.