Building Corpora of Technical Texts : Approaches and Tools
Název česky | Budování korpusů technických textů : přístupy a nástroje |
---|---|
Autoři | |
Rok publikování | 2011 |
Druh | Článek ve sborníku |
Konference | Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011 |
Fakulta / Pracoviště MU | |
Citace | |
www | |
Obor | Informatika |
Klíčová slova | language of mathematics;mathematics of language;math representation;m-term;similarity;DML-CZ;EuDML |
Popis | Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clustering of mathematical corpus. We provide an overview of our toolset, summarize our experiments to date and propose further research directions and approaches. |
Související projekty: |