Resources‎ > ‎

OGER: OntoGene Entity Recognition


Summary

The OntoGene’s Biomedical Entity Recogniser (OGER) is a RESTful web service, implemented on top of  the Bio Term Hub. OGER allows a remote user to batch annotate a collection of documents. Recently, we have participated in a community-organized evaluation of Bio Text Mining services (BioCreative/TIPS), in which our system obtained the best results according to four of the six evaluation metrics.

NEW (Jan 2019): OGER++ paper just published!

Check also our screencast introduction to BTH and OGER (180MB: it might take a while to load).



Test OGER using the web interface

OGER can be accessed either through a web interface for single text annotation, or as a web service (typically for batch annotations).

The base URL is the following, which leads you to the web interface.

https://pub.cl.uzh.ch/projects/ontogene/oger/          <BASE_URL>


Use OGER as a web service

Different requests can be composed starting from the base URL above and adding specifiers as described below.

A more detailed explanation can be found here.

Any incorrect or unspecified specifier will lead to an error page providing some basic information.

Currently we support two types of requests: fetch and upload.
  • A fetch request will retrieve an existing document from a known source:   
          GET <BASE_URL>/fetch/<SOURCE>/<OUTPUT_FORMAT>/<DOC_ID>
          POST <BASE_URL>/fetch/<SOURCE>/<OUTPUT_FORMAT>/<DOC_ID>

  • An upload request is used to submit a text to be annotated.
          POST <BASE_URL>/upload/<INPUT_FORMAT>/<OUTPUT_FORMAT> [/<DOC_ID>]


The accepted values of SOURCE are currently:
  • pubmed (PubMed abstract identifier)
  • pmc (PubMed Central identifier)

DOC_ID is a PubMed or PubMed Central numerical identifier (depending on which SOURCE option has been provided) in a fetch request. It can also be used as an optional value in an upload request to specify an ID which will be added as an attribute value in the output documents formats that contain a document ID (odin, odin_custom).

The accepted values of INPUT_FORMAT are:
  • txt (plain text)
  • bioc (BioC, XML format)
  • pxml (PubMed XML)
  • pxml.gz (PubMed XML, compressed)
  • nxml (PubMed Central XML)
The accepted values of OUTPUT_FORMAT are: 
  • tsv (tab separated values, only detected entities produced as output)
  • text_tsv (tab separated values, full text tokenized, with detected entities replacing the tokens they are composed of)
  • xml (own simple self-documented XML format)
  • bioc (BioC, XML format)
  • bioc_json (BioC, json format)
  • pubanno_json (PubAnnotation json format)
  • odin, odin_customs (these formats are used by our own curation interface ODIN)
Check the info page below for additional options not yet documented here:

https://github.com/OntoGene/OGER/wiki/REST-API



Examples

An example of a complete request is the following

curl -X GET https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/tsv/21436587

It retrieves the annotated version of PubMed abstract 21436587 in tsv format. You can get a PMC document by replacing "pubmed" with "pmc" (and using a PMC identifier).

Another similar example, using as output json format for PubAnnotation:

curl -X GET https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/pubanno_json/6234315

As another example, you can upload an article for annotation:

echo 'tumor cells reduced HSC numbers' | POST https://pub.cl.uzh.ch/projects/ontogene/oger/upload/txt/odin

which returns an XML document with our ODIN XML markup. 

ą
Fabio Rinaldi,
Jan 10, 2018, 4:56 PM
Comments