Building Corpora of Technical Texts : Approaches and Tools
Authors | |
---|---|
Year of publication | 2011 |
Type | Article in Proceedings |
Conference | Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011 |
MU Faculty or unit | |
Citation | |
Web | |
Field | Informatics |
Keywords | language of mathematics;mathematics of language;math representation;m-term;similarity;DML-CZ;EuDML |
Description | Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clustering of mathematical corpus. We provide an overview of our toolset, summarize our experiments to date and propose further research directions and approaches. |
Related projects: |