ODIN (The OntoGene Document INspector) is a tool aimed at supporting the curation of biomedical literature through integration of powerful text mining technologies.
The ODIN system is being developed within the scope of the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature), as a collaboration between the OntoGene group at the University of Zurich and the NITAS/TMS group (Text Mining Services) of Novartis Pharma AG. The purpose of the system is to allow a human annotator/curator to leverage upon the result of a advanced text mining system in order to enhance the speed and effectiveness of the annotation process.
The OntoGene text mining system takes as input a document in plain text or a number of supported xml-based formats (including PubMed Central) and processes it with a custom NLP pipeline, which includes Named Entity recognition and relation extraction. Entities which are currently supported include proteins, genes, experimental methods, cell lines, species, diseases, pharmacological substances. Entities detected in the input document are disambiguated with respect to a reference database (UniProt, EntrezGene, NCBI taxonomy, PSI-MI ontology, PharmGKB).
The annotated documents are handed back to the ODIN interface (as pure XML documents), which allows multiple display modalities, plus various selection and modification options. The curator/annotator can view the whole document with in-line annotations highlighted, or can browse the extracted entities and be pointed back to the mentions of the entities within the original document. All entity mentions are entirely editable: the curator can easily add or delete any of them, and also change its extent (i.e. add/remove words to its right or left) with a simple click of the mouse.
Different entity views are supported, with sorting capabilities according to different criteria (entity type, entity mention, confidence score, etc.). Selective highlighting of text units (e.g. sentences) containing desired entities (terms or gene identifiers) is supported. Rapid disambiguation can be achieved through manual organism selection. Additionally, extensive logging functionalities are provided.
If you find this work interesting or inspiring, please cite the following papers:
Fabio Rinaldi, Simon Clematide, Gerold Schneider, Martin Romacker, Thérèse Vachon. ODIN: An Advanced Interface for the Curation of Biomedical Literature. Fourth International Biocuration Conference, 2010. Also available as:
Rinaldi, Fabio, Clematide, Simon, Schneider, Gerold, Romacker, Martin, and Vachon, Therese. ODIN: An Advanced Interface for the Curation of Biomedical Literature. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2010.5169.1> (2010)
Fabio Rinaldi, Gerold Schneider, Simon Clematide, Silvan Jegen,Pierre Parisot , Martin Romacker and Therese Vachon. OntoGene (Team 65): preliminary analysis of participation in BioCreative III. Proceedings of BioCreative III, September 13-15, 2010, Bethesda, Maryland, USA.
Fabio Rinaldi, Simon Clematide, Yael Garten, Michelle Whirl-Carrillo, Li Gong, Joan M. Hebert, Katrin Sangkuhl, Caroline F. Thorn, Teri E. Klein, and Russ B. Altman. Using ODIN for a PharmGKB revalidation experiment. The Journal of Biological Databases and Curation, Oxford Journals, 2012, bas021; doi:10.1093/database/bas021
More screenshots available here (might take long to load!).
Technical notes: ODIN is a browser-based application. It will work on any operating system, as long as a recent version of one of the popular browsers is available (firefox recommended). However, a specific application might require some customizations on the server, for example addition of specific resources (such as domain entities and terminology) in order to deliver a satisfactory end-user experience.
A public version is in preparation. Presently access is granted upon request. For further information contact Fabio Rinaldi (firstname.lastname@example.org).