The extraction of various relations stated to hold between biomolecular entities is one of the most frequently addressed information extraction tasks in domain studies. Typical relation extraction targets involve protein-protein interactions or gene regulatory relations. However, in the GENIA corpus, such associations involving change in the state or properties of biomolecules are captured in the event annotation.
The GENIA corpus relation annotation aims to complement the event annotation of the corpus by capturing (primarily) static relations, relations such as part-of that hold between entities without (necessarily) involving change.
The most recent version of the GENIA Relation corpus was released as the REL task dataset of the BioNLP Shared Task 2011. This data is available in the standoff format introduced on the BioNLP ST'11 format page.
The GENIA Relation annotations formed the basis for the BioNLP Shared Task 2011 REL task.
Encoding scheme
The GENIA Relation annotations are encoded in BioNLP Shared Task - flavored standoff format, described on in the BioNLP Shared Task format documentation.
Guidelines
As part of the GENIA Relation annotation effort, we introduced a relation ontology that aims to provide a set of relations which define a detailed and broadly applicable set of relation types based on accepted domain standard concepts for use in corpus annotation and domain information extraction approaches. To ensure that the meaning of the relationships is explicit, the relations are specified in OWL (see download section). We integrate categories and relations from several domain ontologies including IAO, OBI, GO and the GENIA ontology for maximal compatibility. The basic relations between individuals are organized as displayed in the figure below, where R stands for reflexivity, S for symmetry, T for transitivity, Anti for anti-symmetry and AS for asymmetry.
The development of the GENIA relation ontology also resulted in two novel ontology design patterns that are particularily suited for applications in text mining where the exact referent of a term cannot always be reliably determined. We refer to the publication "Applying ontology design patterns" for more information on these aspects of the annotation.
Publications
Pyysalo, Sampo, Tomoko Ohta, Jin-Dong Kim and Jun'ichi Tsujii. Static relations: a piece in the biomedical information extraction puzzle. Proceedings of BioNLP'09.
Ohta, Tomoko, Sampo Pyysalo, Jin-Dong Kim and Jun'ichi Tsujii. A Re-evaluation of Biomedical Named Entity - Term Relations. Journal of Bioinformatics and Computational Biology (JBCB) Vol. 8, No. 5 (2010) 917–92
Hoehndorf, Robert, Axel-Cyrille Ngonga Ngomo., Sampo Pyysalo, Tomoko Ohta, Anika Oellrich, and Dietrich Rebholz-Schuhmann. Applying ontology design patterns to the implementation of relations in GENIA. Proceedings of SMBM'10.
The latest revision of the GENIA relation annotation is available as the BioNLP Shared Task 2011 REL task corpus. This data is split into visible training and test sets and a "blind" test set. (For evaluation on the test set, please see the task homepage.)
GENIA relation annotation, training set: GENIA_relation_annotation_training_data.tar.gz
GENIA relation annotation, development set: GENIA_relation_annotation_development_data.tar.gz
GENIA relation annotation, test set: GENIA_relation_annotation_test_data.tar.gz
Tomoko Ohta: GENIA corpus relation annotation coordinator
See also GENIA Project acknowledgments page