Extracting Phrases from PDT 2.0
Authors | |
---|---|
Year of publication | 2011 |
Type | Article in Proceedings |
Conference | Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011 |
MU Faculty or unit | |
Citation | |
Web | https://nlp.fi.muni.cz/raslan/2011/paper11.pdf |
Field | Informatics |
Keywords | PDT; corpus; treebank; export; format; complex annotation; phrase; clause |
Description | The Prague Dependency Treebank (henceforth PDT) is a large collection of texts in Czech. It is renown for its respectable size and rich multi-layer annotation covering a wide range of complex phenomena. One the other hand, it can be argued that the complexity of the dataset may be a notable hindrance to using certain aspects of the data in a straightforward way. To overcome these problems, we present an export filter converting PDT into a more transparent data format, containing information about the most common phrase types. We believe that availability of the PDT data in this form will help encourage people unfamiliar with the underlying theory to use the corpus. |
Related projects: |