European Union Language Resources in Sketch Engine
Autoři | |
---|---|
Rok publikování | 2016 |
Druh | Článek ve sborníku |
Konference | Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) |
Fakulta / Pracoviště MU | |
Citace | |
www | http://www.lrec-conf.org/proceedings/lrec2016/pdf/572_Paper.pdf |
Obor | Informatika |
Klíčová slova | JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance |
Popis | Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs). |
Související projekty: |