Type IV secretion systems (T4SS) are mechanisms for transferring DNA and proteins across cellular boundaries. T4SS are found in a broad range of Bacteria and in some Archaea. These systems enable gene transfer across cellular membranes, thus contributing to the spread of antibiotic resistance and virulence genes, making them an especially important mechanism in infectious disease research. To explore the opportunities opened by structured event extraction from text for establishing a better understanding of these systems, we annotated a corpus of T4SS-relevant documents using the GENIA event representation.
This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Infectious Diseases (ID) task. The ID corpus annotations include a larger and more comprehensive set of annotations for associated events.
The corpus is distributed in the GENIA Event corpus XML format.
Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.
The corpus is annotated following the GENIA Event corpus annotation guidelines, adapted as described in "Towards Event Extraction from Full Texts on Infectious Diseases"
Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.
Sampo Pyysalo, Tomoko Ohta, Han-Cheol Cho, Dan Sullivan, Chunhong Mao, Bruno Sobral, Jun'ichi Tsujii, and Sophia Ananiadou. Towards Event Extraction from Full Texts on Infectious Diseases. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL 2010, pages 132–140.
T4SS event corpus version 1.0: T4SS_annotation.tar.gz