past projects‎ > ‎



The MiPhaGe (pronounced “my phage”) project ("Mining the Pharmacogenomics Literature") has a core focus on the development of advanced text mining techniques for the extraction of complex types of interactions among biological entities from the biomedical literature, in particular on the relationships among gene, diseases and drugs, as well as protein and genetic interactions. The main aim of MiPhaGe is to push the boundaries of what text mining can reach in the process of extracting interactions among domain entities, taking as specific case studies the BioGRID and the PharmGKB databases.
BioGRID is a large-scale, comprehensive collection of protein and genetic interactions from major model organism species. It is an international initiative co-hosted by the University of Edinburgh, Princeton University, Stanford University, the University of British Columbia, and the Samuel Lunenfeld Research Institute at the University of Toronto, and it is funded by the BBSRC, NIH, CIHR and FP7 grants. PharmGKB is a manually curated database of interactions among gene, diseases and drugs, developed at Stanford University with funding from the NIH. PharmGKB aims at describing the impact of human genetic variations on drug response. The importance of relation extraction in biomedical text mining is manifested for example by numerous competitive evaluations organized by the community, such as the BioCreative challenge and the BioNLP event extraction shared task. The project proposer and his team have already demonstrated that they are capable of developing state-of-the-art techniques for relation mining from the literature [1, 2, 3, 4]. Their results in the BioCreative competitions are among the best reported. The MiPhaGe project will allow the consolidation of their results, as well as the investigation of more advanced techniques, in order to retain a leading edge in this very competitive domain.
There is also a great interest of the pharmaceutical industry towards techniques that can improve the process of drug discovery by extracting and organizing the information hidden in the biological literature. Therefore the results of the project are not only of academic importance, but also have a potential impact in the knowledge management process in commercial settings, as confirmed by the project support offered by the NITAS/TMS group at Novartis.

NOTE: currently MiPhaGe can be considered a subproject of SASEBio, without independent funding. Separate funding is currently being sought.


[1] Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marc von Allmen, Pierre Parisot, Martin Romacker, and Therese Vachon. OntoGene in BioCreative II. Genome Biology, 9(Suppl 2):S13, 2008. 1

[2] Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide, Therese Vachon, and Martin Romacker. OntoGene in BioCreative II.5. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3):472–480, 2010. 1

[3] Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael Hess, and Martin Romacker. An Environment for Relation Mining over Richly Annotated Corpora: the case of GENIA. BMC Bioinformatics, 7(Suppl 3):S3, 2006. 1

[4] Gerold Schneider, Simon Clematide, Fabio Rinaldi, Martin Romacker, and Therese Vachon. Detection of interaction articles and experimental methods in biomedical literature. BMC Bioinformatics, special issue on BioCreative III, accepted for publications, -:–, 2011. 1