OverviewPart-of-speech and syntactic (phrase structure) annotation has been created for all of the 1999 abstracts of the primary GENIA corpus. The annotation scheme of the GENIA Treebank has been designed based on the Penn Treebank II (PTB) bracketing guidelines (Bies et al, 1995).ExampleCorpus formatThe primary GENIA treebank distribution is in XML format. Conversions into the Penn Treebank format have been created by a number of researchers not directly affiliated with the GENIA project. One recent version of the GENIA Treebank in PTB format was created by David McClosky. Major applicationsThe GENIA Treebank is the most widely applied corpus for training and adapting parsers to biomedical domain texts and has been applied
DocumentationEncoding scheme
Annotation guidelines
Publications
Download
AcknowledgmentsYuka Tateisi: GENIA Treebank annotation coordinator See also GENIA Project acknowledgments page |
GENIA corpus >