The Art of Mathematics Retrieval
Authors | |
---|---|
Year of publication | 2011 |
Type | Article in Proceedings |
Conference | Proceedings of the 2011 ACM Symposium on Document Engineering |
MU Faculty or unit | |
Citation | SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. Online. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011, p. 57--60. ISBN 978-1-4503-0863-2. Available from: https://dx.doi.org/10.1145/2034691.2034703. |
web | |
Doi | http://dx.doi.org/10.1145/2034691.2034703 |
Field | Informatics |
Keywords | math indexing and retrieval; mathematical digital libraries; information systems; information retrieval; mathematical content search; document ranking of mathematical papers; math text mining; MIaS; WebMIaS |
Attached files | |
Description | The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene. |
Related projects: |