The OntoGene group at the University of Zurich has developed efficient techniques for text mining in the molecular biology domain. One of their core interests in recent years has been the detection of mentions of protein-protein interactions. Using the IntAct database as a gold standard, they have developed techniques for the identification of information relevant to the process of curation, such as the experimental methods used by the authors [1], the organism which are hosts of the experiment and which contribute the interacting proteins [2], the protein themselves [3], and their interactions [4].
The effectiveness of their approach has been validated by participation to numerous shared evaluations, such as BioCreative II [5], BioNLP event extraction task [6], and BioCreative II.5 [7]. Recently, in collaboration with the NIBR/IT group at Novartis, they have developed an interesting prototype of an environment supporting the process of semi-automated semantic enrichment of the literature. The environment allows an expert user to efficiently revise annotations suggested by the system, or to add new annotations where the system missed an entity or an interaction. The system is also capable of reusing the annotations added by the expert in subsequent applications, using a process of incremental learning.
The SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature) aims at consolidating the existing text mining activities of the OntoGene group, by further improving their relation extraction techniques, and applying them to new areas, within the context of the literature curation process. New types of interactions, such as drug/diseases (of particular interest to their industrial partner) will be considered, along with incremental improvements to their existing techniques for protein-protein interaction detection. As in the past, their techniques will be subject to community-based evaluation through participation in shared text mining challenges. Additionally, the project offers an opportunity to turn the existing semi-automated annotation prototype into a fully fledged system which can then be employed by the target user groups. Intensive collaborations with other groups will be sought at all stages of development, in particular to guarantee a continuous feedback on the effective usability of the proposed tools.
Duration: 4 years (August 2010 - July 2014) Funding: Swiss National Science Foundation (grant 105315_130558/1) and NIBR/IT, Novartis Pharma AG Principal investigator: Dr. Fabio RinaldiReferences[1] Thomas Kappeler, Simon Clematide, Kaarel Kaljurand, Gerold Schneider, Fabio Rinaldi. Towards Automatic Detection of Experimental Methods from Biomedical Literature. Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008). [2] Thomas Kappeler, Kaarel Kaljurand, Fabio Rinaldi. TX Task: Automatic Detection of Focus Organisms in Biomedical Publications. BioNLP workshop, NAACL/HLT, Boulder, Colorado 2009. [3] Kaarel Kaljurand, Fabio Rinaldi, Thomas Kappeler, Gerold Schneider. Using existing biomedical resources to detect and ground terms in biomedical literature. Artificial Intelligence in Medicine, Verona, July 2009. [4] Gerold Schneider, Kaarel Kaljurand, Thomas Kappeler, Fabio Rinaldi. Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. CICLING 2009. [5] Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13. [6] Kaarel Kaljurand, Gerold Schneider and Fabio Rinaldi. A dependency based approach to the BioNLP 2009 Shared Task. BioNLP workshop, NAACL/HLT, Boulder, Colorado, 2009 [7] Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide,
Thérèse Vachon, Martin Romacker, "OntoGene in BioCreative II.5," IEEE/ACM Transactions on Computational Biology and Bioinformatics,
17 May. 2010. IEEE computer Society Digital Library. IEEE Computer
Society, < http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.50>
Project publications (journal papers highlighted) - Fabio Rinaldi, Gerold Schneider, Simon Clematide, Silvan Jegen, Pierre Parisot, Martin Romacker and Therese Vachon. OntoGene (Team 65): preliminary analysis of participation in BioCreative III. BioCreative III workshop, Bethesda, Maryland, September 13-15, 2010.
- Fabio Rinaldi, Simon Clematide, Gerold Schneider, Martin Romacker, Thérèse Vachon. ODIN: An Advanced Interface for the Curation of Biomedical Literature. Biocuration 2010, the Conference of the International Society for Biocuration and the 4th International Biocuration Conference. Tokyo, Japan, 11-14 October 2010. Available from Nature Precedings<http://dx.doi.org/10.1038/npre.2010.5169.1>
- Dietrich Rebholz-Schuhmann, Antonio Jimeno, Chen Li, Senay Kafkas, Ian Lewin, Ning Kang, Peter Corbett, David Milward, Ekaterina Buyko, Elena Beisswanger, Kerstin Hornbostel, Alexandre Kouznetsov, Rene Witte, Jonas B. Laurila, Christopher JO Baker, Chen-Ju Kuo, Simon Clematide, Fabio Rinaldi, Richárd Farkas, György Móra, Kazuo Hara, Laura Furlong, Michael Rautschka, Mariana Lara Neves, Alberto Pascual-Montano, Qi Wei, Nigel Collier, Faisal Mahbub Chowdhury, Alberto Lavelli, Rafael Berlanga, Roser Morante, Vincent Van Asch, Walter Daelemans, José Luís Marina, Erik van Mulligen, Jan Kors and Udo Hahn. Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. Semantic Mining in Medicine, EBI, Cambridge, UK, 25-26 October 2010.
- Fabio Rinaldi, Gerold Schneider, Simon Clematide. Mining complex Drug/Gene/Disease relations in PubMed. Proceedings of the workshop "Mining the Pharmacogenomics Literature", Pacific Symposium on Biocomputing, Hawaii, January 2011. http://www.zora.uzh.ch/53013/
- Simon Clematide, Fabio Rinaldi, Gerold Schneider. OntoGene at CALBC II and Some Thoughts on the Need of Document-Wide Harmonization. Proceedings of the CALBC II workshop. EBI, Cambridge, UK, 16-18 March 2011.
- Gerold Schneider and Fabio Rinaldi. A data-driven approach to alternations based on protein-protein interactions. Proceedings of the 3rd Congreso Internacional de Lingüística de Corpus (CILC), Valencia, Spain, 7-9 April, 2011.
- Don Tuggener, Manfred Klenner, Gerold Schneider, Simon Clematide, Fabio Rinaldi. An Incremental Model for the Coreference Resolution Task of BioNLP 2011. Proceedings of the BioNLP11 shared task. Portland, Oregon, 24 June, 2011.
- Fabio Rinaldi, Kaarel Kaljurand, Rune Saetre. Terminological resources for Text Mining over Biomedical Scientific Literature. Journal of Artificial Intelligence in Medicine. Volume 52, Issue 2, June 2011, Pages 107-114. doi:10.1016/j.artmed.2011.04.011, PMID: 21652190
- Zhiyong Lu, Hung-Yu Kao, Chih-Hsuan Wei, Minlie Huang, Jingchen Liu, Cheng-Ju Kuo, Chun-Nan Hsu,, Richard Tzong-Han Tsai, Hong-Jie Dai, Naoaki Okazaki, Han-Cheol Cho, Martin Gerner, Illes Solt, Shashank Agarwal, Feifan Liu, Dina Vishnyakova, Patrick Ruch, Martin Romacker, Fabio Rinaldi, Sanmitra Bhattacharya, Padmini Srinivasan, Hongfang Liu, Manabu Torii, Sergio Matos, David Campos, Karin Verspoor, Kevin M. Livingston, and W. John Wilbur. The Gene Normalization Task in BioCreative III. BMC Bioinformatics, special issue on BioCreative III, Volume 12 Suppl 8, October 2011.
- Cecilia Arighi, Phoebe Roberts, Shashank Agarwal, Sanmitra Bhattacharya, Gianni Cesareni, Andrew Chatr-aryamontri, Simon Clematide, Pascale Gaudet, Michele Gwinn Giglio, Ian Harrow, Eva Huala, Martin Krallinger, Ulf Leser, Donghui Li, Feifan Liu, Zhiyong Lu, Lois Maltais, Naoaki Okazaki, Livia Perfetto, Fabio Rinaldi, Rune Sætre, David Salgado, Padmini Srinivasan, Philippe E. Thomas, Luca Toldo, Lynette Hirschman and Cathy H Wu. BioCreative III Interactive Task: an Overview. BMC Bioinformatics, special issue on BioCreative III,Volume 12 Suppl 8, October 2011.
- Martin Krallinger, Miguel Vazquez, Florian Leitner, David Salgado, Andrew Chatr-aryamontri, Andrew Winter, Livia Perfetto, Leonardo Briganti, Luana Licata, Marta Iannuccelli, Luisa Castagnoli, Gianni Cesareni, Mike Tyers, Gerold Schneider, Fabio Rinaldi, Robert Leaman, Graciela Gonzalez, Sergio Matos, Sun Kim, W Wilbur, Luis Rocha, Hagit Shatkay, Ashish V Tendulkar, Shashank Agarwal, Feifan Liu, Xinglong Wang, Rafal Rak, Keith Noto, Charles Elkan, Zhiyong Lu. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics, special issue on BioCreative III, Volume 12 Suppl 8, October 2011.
- Gerold Schneider, Simon Clematide, Fabio Rinaldi. Detection of interaction articles and experimental methods in biomedical literature. BMC Bioinformatics, special issue on BioCreative III, Volume 12 Suppl 8, October 2011.
- Dietrich Rebholz-Schuhmann, Antonio Jimeno, Chen Li, Senay Kafkas, Ian Lewin, Ning Kang, Peter Corbett, David Milward, Ekatarina Buyko, Elena Beisswanger, Kerstin Hornbostel, Alexandre Kouznetsov, Rene Witte, Jonas B. Laurila, Christopher JO Baker, Chen-Ju Kuo, Simone Clematide, Fabio Rinaldi, Richard Farkas, György Móra, Kazuo Hara, Laura Furlong, Michael Rautschka, Mariana Lara Neves, Alberto Pascual-Morante, Qi Wei, Nigel Collier, Faisal Mahbub Chowdhury, Alberto Lavelli, Rafael Berlanga, Roser Morante, Vincent van Asch, Walter Daelemans, Erik van Mulligen, Jan Kors and Udo Hahn. Assessment of NER solutions against the first and second CALBC Silver Standard Corpus, Journal of Biomedical Semantics, Volume 2, Suppl 5, S11, October 2011.
- Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine (SMBM 2010), Cambridge, United Kingdom, October, 2010. Edited by Nigel Collier, Udo Hahn, Dietrich Rebholz-Schuhmann, Fabio Rinaldi, Sampo Pyysalo.
- Dietrich Rebholz-Schuhmann, Fabio Rinaldi, Sampo Pyysalo, Nigel Collier, Udo Hahn. Towards mature use of semantic resources for biomedical analyses. Journal of Biomedical Semantics, Volume 2, Suppl 5, S11, October 2011.
- Fabio Rinaldi, Simon Clematide, Gerold Schneider. SASEBio: Semi-Automated Semantic Enrichment of the Biomedical Literature. 1st International SystemsX.ch Conference on Systems Biology (poster presentation). Oct 24th-26th, Basel, Switzerland.
- Fabio Rinaldi, Simon Clematide, Gerold Schneider. ODIN: Advanced Text Mining in Support of the Curation Process. The Pacific Symposium on Biocomputing (PSB) 2012, Fairmont Orchid, Big Island of Hawaii, Jan 3rd-7th, 2012 (poster presentation).
- Simon Clematide, Fabio Rinaldi. Ranking interactions for a curation task. The Tenth International Conference on Machine Learning and Applications (special session on Machine Learning for Biomedical Literature Analysis and Text Retrieval). Honolulu, December 2011.
- Fabio Rinaldi, Simon Clematide, Yael Garten, Michelle Whirl-Carrillo, Li Gong, Joan M. Hebert, Katrin Sangkuhl, Caroline F. Thorn, Teri E. Klein, Russ B. Altman. Using ODIN for a PharmGKB revalidation experiment. The Journal of Biological Databases and Curation, Oxford Journals, 2012, bas021; doi:10.1093/database/bas021 [NOTE: presented at the Conference of the International Society for Biocuration, Washington D.C., April 2012], PMC3332569, PMID: 22529178
- Fabio Rinaldi, Gerold Schneider, Simon Clematide. Relation Mining Experiments in the Pharmacogenomics Domain. Journal of Biomedical Informatics, 2012.doi:10.1016/j.jbi.2012.04.014, PMID: 22580177
- Fabio Rinaldi, Simon Clematide and Simon Hafner. Ranking of CTD articles and interactions using the OntoGene pipeline. Proceedings of the BioCreative-2012 Workshop. Washington D.C., April 2012.
- Gerold Schneider, Fabio Rinaldi, Simon Clematide. Dependency parsing for interaction detection in pharmacogenomics. Proceedings of LREC 2012: The eighth international conference on Language Resources and Evaluation. Istanbul, 21-27 May 2012.
- Gerold Schneider, Simon Clematide, Gintarė Grigonytė, Fabio Rinaldi. Using syntax features and document discourse for relation extraction on PharmGKB and CTD. SMBM 2012, Zurich, September 3-4, 2012.
- Ananiadou, Sophia; Pyysalo, Sampo; Rebholz-Schuhmann, Dietrich; Rinaldi, Fabio; Salakoski, Tapio (eds.). Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM 2012), Zurich, 2012. doi:10.5167/uzh-64476
- Fabio Rinaldi, Using biomedical databases as knowledge sources for large-scale text mining. E-LKR workshop, SEPLN 2012, Castellon de la Plana, Spain, September 7, 2012.
- Grigonytė, G., Rinaldi F., Volk, M. Change of Biomedical Domain Terminology Over Time. In Proc. of 5th Baltic Conf. On Human Language Technologies, Tartu, Estonia, October 4–5, 2012.
- Fabio Rinaldi, Gerold Schneider, Simon Clematide and Gintare Grigonyte, Notes about the OntoGene pipeline. AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, November 2-4, 2012, Arlington, Virginia, USA.
- Simon Clematide, Fabio Rinaldi. Ranking relations betweeen diseases, drugs, and genes for a curation task. Journal of Biomedical Semantics (BMC), 2012. doi:10.1186/2041-1480-3-S3-S5
- Fabio Rinaldi, Simon Clematide, Simon Hafner, Gerold Schneider, Gintare Grigonyte, Martin Romacker, Therese Vachon. Using the OntoGene pipeline for the triage task of BioCreative 2012, The Journal of Biological Databases and Curation, Oxford Journals, 2012. doi:10.1093/database/bas053, PMC3568389, PMID: 23396322
- Fabio Rinaldi. The OntoGene system: an advanced information extraction application for biological literature. NETTAB workshop on Integrated Bio-Search. November 14-16, 2012, Como, Italy. Also available in EMBnet.journal, Volume 18, Supplement B, pp 47-49.
- Fabio Rinaldi, Socorro Gama-Castro, Alejandra López-Fuentes, Yalbi Balderas-Martínez, Julio Collado-Vides. Digital Curation Experiments for RegulonDB, BioCuration 2013, April 10th, 2013, Cambridge, UK.
- Donald C. Comeau, Rezarta Islamaj Doğan, Paolo Ciccarese, Kevin Bretonnel Cohen, Martin Krallinger, Florian Leitner, Zhiyong Lu, Yifang Peng, Fabio Rinaldi, Manabu Torii, Alfonso Valencia, Karin Verspoor, Thomas Wiegers, Cathy Wu, and W. John Wilbur. BIoC: A Minimalist Approach to Interoperability for Biomedical Text Processing. The Journal of Biological Databases and Curation (accepted for publication).
|