PTM event corpus

Overview

Post-translational-modifications (PTM), amino acid modifications of proteins after translation, are one of the posterior processes of protein biosynthesis for many proteins, and they are critical for determining protein function such as its activity state, localization, turnover and interactions with other biomolecules. While there have been many studies of information extraction targeting individual PTM types, there was until recently little effort to address extraction of multiple PTM types at once in a unified framework.

The results of the BioNLP Shared Task 2009 indicated that event extraction technology is well suited for PTM extraction. To address these opportunities, we created this first corpus targeting PTMs that is annotated using the GENIA event representation.

This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Epigenetics and Post-translational Modifications (EPI) task. The EPI corpus annotations include a larger and more comprehensive set of annotations for associated events.

Corpus format

The corpus is distributed in the BioNLP Shared Task - flavored standoff format.

Annotation guidelines

The corpus is annotated following the GENIA Event corpus annotation guidelines, adapted as described in " Event Extraction for Post-Translational Modifications"

  • Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.

Publications

Download

ċ
post-translational_modifications_training_data.tar.gz
(78k)
Tomoko OHTA,
Aug 21, 2015, 1:45 AM