GENIA corpus‎ > ‎

Event annotation

Overview

The GENIA corpus event annotation marks expressions stating bio-medical events, changes in the state or properties of physical entities, in text. Event annotations are text-bound association of arbitrary numbers of entities in specific roles (e.g. Theme, Cause).

The current general release of the GENIA corpus event annotation covers 1,000 of the 1,999 abstracts of the primary GENIA corpus, marking 36,114 events in 9,372 sentences. (Additional annotations of the corpus have been used to create blind test sets for various shared tasks.)

Example


Corpus format

The GENIA Event corpus is available in an XML format described in the GENIA Corpus Manual. A subset of the corpus annotations are also available in a standoff format as part of the BioNLP Shared Task 2009 and 2011 corpora. 

Major applications

  • The GENIA Event corpus annotations served as the initial source data for the BioNLP Shared Task 2009 and the BioNLP Shared Task 2011 GE task.
  • Through their use in the BioNLP Shared Task corpora, the GENIA Event corpus annotations form the basic training and development material for the majority of currently available structured event extraction systems for biomedical domain texts.

Documentation

Encoding scheme

  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.

Annotation guidelines

  • Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.
  • Kim, Jin-Dong, Tomoko Ohta, Yuka Tateisi and Jun’ichi Tsujii. GENIA Ontology. Technical Report(TR-NLP-UT-2006-2). Tsujii Laboratory, University of Tokyo, 2006.
  • Kim, Jin-Dong and Jun’ichi Tsujii. GENIA Corpus Curation Framework. Technical Report(TR-NLP-UT-2006-3). Tsujii Laboratory, University of Tokyo, 2006. 

Publications

Download

Acknowledgments

Tomoko Ohta: annotation coordinator
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:48 PM
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:48 PM
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:50 PM
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:48 PM