The OntoGene group at the University of Zurich has developed efficient techniques for text mining in the molecular biology domain. One of their core interests in recent years has been the detection of mentions of protein-protein interactions. Using the IntAct database as a gold standard, they have developed techniques for the identification of information relevant to the process of curation, such as the experimental methods used by the authors [1], the organism which are hosts of the experiment and which contribute the interacting proteins [2], the protein themselves [3], and their interactions [4]. The effectiveness of their approach has been validated by participation to numerous shared evaluations, such as BioCreative II [5], BioNLP event extraction task [6], and BioCreative II.5 [7]. Recently, in collaboration with the NITAS/TMS group at Novartis, they have developed an interesting prototype of an environment supporting the process of semi-automated semantic enrichment of the literature. The environment allows an expert user to efficiently revise annotations suggested by the system, or to add new annotations where the system missed an entity or an interaction. The system is also capable of reusing the annotations added by the expert in subsequent applications, using a process of incremental learning. The SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature) aims at consolidating the existing text mining activities of the OntoGene group, by further improving their relation extraction techniques, and applying them to new areas, within the context of the literature curation process. New types of interactions, such as drug/diseases (of particular interest to their industrial partner) will be considered, along with incremental improvements to their existing techniques for protein-protein interaction detection (of potential interest to the IntAct group at EBI). As in the past, their techniques will be subject to community-based evaluation through participation in shared text mining challenges. Additionally, the project offers an opportunity to turn the existing semi-automated annotation prototype into a fully fledged system which can then be employed by the target user groups. Intensive collaborations with both NITAS and EBI will be sought at all stages of development, in particular to guarantee a continuous feedback on the effective usability of the proposed tools. Duration: 3 years (August 2010 - July 2013) Funding: Swiss National Science Foundation (grant 105315_130558/1) and NITAS/TMS, Novartis Pharma AG Principal investigator: Dr. Fabio Rinaldi References [1] Thomas Kappeler, Simon Clematide, Kaarel Kaljurand, Gerold Schneider, Fabio Rinaldi. Towards Automatic Detection of Experimental Methods from Biomedical Literature. Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008). [2] Thomas Kappeler, Kaarel Kaljurand, Fabio Rinaldi. TX Task: Automatic Detection of Focus Organisms in Biomedical Publications. BioNLP workshop, NAACL/HLT, Boulder, Colorado 2009. [3] Kaarel Kaljurand, Fabio Rinaldi, Thomas Kappeler, Gerold Schneider. Using existing biomedical resources to detect and ground terms in biomedical literature. Artificial Intelligence in Medicine, Verona, July 2009. [4] Gerold Schneider, Kaarel Kaljurand, Thomas Kappeler, Fabio Rinaldi. Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. CICLING 2009. [5] Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13. [6] Kaarel Kaljurand, Gerold Schneider and Fabio Rinaldi. A dependency based approach to the BioNLP 2009 Shared Task. BioNLP workshop, NAACL/HLT, Boulder, Colorado, 2009 [7] Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide, Thérèse Vachon, Martin Romacker, "OntoGene in BioCreative II.5," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17 May. 2010. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.50> |