Post-translational-modifications (PTM), amino acid modifications of proteins after translation, are one of the posterior processes of protein biosynthesis for many proteins, and they are critical for determining protein function such as its activity state, localization, turnover and interactions with other biomolecules. While there have been many studies of information extraction targeting individual PTM types, there was until recently little effort to address extraction of multiple PTM types at once in a unified framework.
The results of the BioNLP Shared Task 2009 indicated that event extraction technology is well suited for PTM extraction. To address these opportunities, we created this first corpus targeting PTMs that is annotated using the GENIA event representation.
This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Epigenetics and Post-translational Modifications (EPI) task. The EPI corpus annotations include a larger and more comprehensive set of annotations for associated events.
The corpus is distributed in the BioNLP Shared Task - flavored standoff format.
Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.
Tomoko Ohta, Sampo Pyysalo, Makoto Miwa, Jin-Dong Kim, and Jun'ichi Tsujii. (2010). Event Extraction for Post-Translational Modifications. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL 2010, pages 19–27.
PTM event corpus training data, version 1.0: post-translational_modifications_training_data.tar.gz