The GENIA corpus event annotation marks expressions stating bio-medical events, changes in the state or properties of physical entities, in text. Event annotations are text-bound association of arbitrary numbers of entities in specific roles (e.g. Theme, Cause).
The current general release of the GENIA corpus event annotation covers 1,000 of the 1,999 abstracts of the primary GENIA corpus, marking 36,114 events in 9,372 sentences. (Additional annotations of the corpus have been used to create blind test sets for various shared tasks.)
The GENIA Event corpus is available in an XML format described in the GENIA Corpus Manual. A subset of the corpus annotations are also available in a standoff format as part of the BioNLP Shared Task 2009 and 2011 corpora.
The GENIA Event corpus annotations served as the initial source data for the BioNLP Shared Task 2009 and the BioNLP Shared Task 2011 GE task.
Through their use in the BioNLP Shared Task corpora, the GENIA Event corpus annotations form the basic training and development material for the majority of currently available structured event extraction systems for biomedical domain texts.
Encoding scheme
Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.
Annotation guidelines
Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.
Kim, Jin-Dong, Tomoko Ohta, Yuka Tateisi and Jun’ichi Tsujii. GENIA Ontology. Technical Report(TR-NLP-UT-2006-2). Tsujii Laboratory, University of Tokyo, 2006.
Kim, Jin-Dong and Jun’ichi Tsujii. GENIA Corpus Curation Framework. Technical Report(TR-NLP-UT-2006-3). Tsujii Laboratory, University of Tokyo, 2006.
Publications
Jin-Dong Kim, Tomoko Ohta, and Jun'ichi Tsujii, Corpus annotation for mining biomedical events from literature, BMC Bioinformatics 2008, 9:10. (Open Access; Highly accessed)
GENIA Event corpus version 0.9: GENIA_event_annotation_0.9.tgz
Tomoko Ohta: annotation coordinator
See also GENIA Project acknowledgments page