mTOR pathway corpus


The construction of pathways is a major focus of present-day biology. Typical pathways involve large numbers of entities of various types whose associations are represented as reactions involving arbitrary numbers of reactants, outputs and modifiers. Until recently, few information extraction approaches were capable of resolving the level of detail in text required to support the annotation of such pathway representations. We argue that event representations of the type popularized by the BioNLP Shared Task are potentially applicable for pathway annotation support. As a step toward realizing this possibility, we study the mapping from a formal pathway representation to the event representation in order to identify remaining challenges in event extraction for pathway annotation support. Following initial analysis, we present a detailed study of protein association and dissociation reactions, proposing a new event class (Dissociation) and representation for the latter and, as a step toward its automatic extraction, introduce a manually annotated resource incorporating the type among a total of nearly 1300 annotated event instances.


Corpus format

The corpus is distributed in the BioNLP Shared Task - flavored standoff format.

Annotation guidelines

The corpus is annotated following the GENIA Event corpus annotation guidelines, adapted as described in "From Pathways to Biomolecular Events: Opportunities and Challenges"
  • Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.