GENIA corpus‎ > ‎

Coreference annotation

Overview

The identification of coreferential expressions, that is, expressions in text referring to the same thing, is important for many applications relying on the analysis of the meaning of statements in text. The GENIA Coreference corpus provides coreference annotations covering all the 1999 abstracts of the primary GENIA corpus.

The coreference annotation was produced by MedCo Annotation Project. The format conversion into GENIA format, and minor bug fixes were made by the GENIA Project.

Example

Corpus format

The coreference corpus is distributed in the XML format described in the GENIA Corpus Manual. A selected subset of revised corpus annotations are also available in a standoff format as part of the BioNLP Shared Task 2011 CO task  corpus. 

Major applications

Documentation

Encoding scheme

  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.

Publications

Download

Acknowledgments

The coreference corpus annotations were produced by the MedCo Annotation Project.