GENIA corpus‎ > ‎

Coreference annotation


The identification of coreferential expressions, that is, expressions in text referring to the same thing, is important for many applications relying on the analysis of the meaning of statements in text. The GENIA Coreference corpus provides coreference annotations covering all the 1999 abstracts of the primary GENIA corpus.

The coreference annotation was produced by MedCo Annotation Project. The format conversion into GENIA format, and minor bug fixes were made by the GENIA Project.


Corpus format

The coreference corpus is distributed in the XML format described in the GENIA Corpus Manual. A selected subset of revised corpus annotations are also available in a standoff format as part of the BioNLP Shared Task 2011 CO task  corpus. 

Major applications


Encoding scheme

  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.




The coreference corpus annotations were produced by the MedCo Annotation Project.