Resources‎ > ‎

OGER: OntoGene Entity Recognition


Summary

The OntoGene’s Biomedical Entity Recogniser (OGER) is a RESTful web service, implemented on top of  the Bio Term Hub. OGER allows a remote user to batch annotate a collection of documents. Recently, we have participated in a community-organized evaluation of Bio Text Mining services (BioCreative/TIPS), in which our system obtained the best results according to four of the six evaluation metrics.

Check our screencast introduction to BTH and OGER (180MB: it might take a while to load).

NEWS: In 2019 we participated in the CRAFT shared task, about large scale NER, with excellent results.
Find here our paper.



Use OGER through the web interface


OGER can be accessed either through a web interface for single text annotation, or as a web service (typically for batch annotations).

The base URL is the following, which leads you to the web interface.

https://pub.cl.uzh.ch/projects/ontogene/oger/          <BASE_URL>


Use OGER as a web service

The complete documentation of the OGER REST API can be found here.


You can easily construct simple queries adding specifiers to the <BASE_URL> mentioned above.

Currently we support two types of requests: fetch and upload. Check also the eamples at the bottom of this page.
  • A fetch request will retrieve an existing document from a known source:   
          GET <BASE_URL>/fetch/<SOURCE>/<OUTPUT_FORMAT>/<DOC_ID>
          POST <BASE_URL>/fetch/<SOURCE>/<OUTPUT_FORMAT>/<DOC_ID>

  • An upload request is used to submit a text to be annotated.
          POST <BASE_URL>/upload/<INPUT_FORMAT>/<OUTPUT_FORMAT> [/<DOC_ID>]

Any incorrect or unspecified specifier will lead to an error page providing some basic information.

The accepted values of SOURCE are currently:
  • pubmed (PubMed abstract identifier)
  • pmc (PubMed Central identifier)

DOC_ID is a PubMed or PubMed Central numerical identifier (depending on which SOURCE option has been provided) in a fetch request. It can also be used as an optional value in an upload request to specify an ID which will be added as an attribute value in the output documents formats that contain a document ID (odin, odin_custom).

The accepted values of INPUT_FORMAT are:
  • txt (plain text)
  • bioc (BioC, XML format)
  • pxml (PubMed XML)
  • pxml.gz (PubMed XML, compressed)
  • nxml (PubMed Central XML)
The accepted values of OUTPUT_FORMAT are: 
  • tsv (tab separated values, only detected entities produced as output)
  • text_tsv (tab separated values, full text tokenized, with detected entities replacing the tokens they are composed of)
  • xml (own simple self-documented XML format)
  • bioc (BioC, XML format)
  • bioc_json (BioC, json format)
  • pubanno_json (PubAnnotation json format)
  • odin, odin_customs (these formats are used by our own curation interface ODIN)
Check the info page below for additional options not yet documented here:

https://github.com/OntoGene/OGER/wiki/REST-API


Examples

An example of a complete request is the following

curl -X GET https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/tsv/21436587


It retrieves the annotated version of PubMed abstract 21436587 in tsv format. You can get a PMC document by replacing "pubmed" with "pmc" (and using a PMC identifier).

Another similar example, using as output json format for PubAnnotation:

curl -X GET https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/pubanno_json/6234315

As another example, you can upload an article for annotation:

echo 'tumor cells reduced HSC numbers' | POST https://pub.cl.uzh.ch/projects/ontogene/oger/upload/txt/odin

which returns an XML document with our ODIN XML markup. 

ą
Fabio Rinaldi,
Jan 10, 2018, 4:56 PM
Comments