ScaleText

Authors

ŘEHŮŘEK Radim POMIKÁLEK Jan

Description ScaleText version 1.0 is a production-grade software system for large-scale scalable semantic search. The core of this result is a vector search engine, realized as a stand-alone software package that implements document indexing and search using vectors for text representation. The vectors are created automatically from plain text using several methods for semantic analysis: LSI, LDA, TF-IDF, Doc2vec a Stanford gloVe. The documents go through several stages, from preprocessing, segmentation, vectorization to vector encoding and storage. Each step is realized by a dedicated component, with its output backed by a backend database engine for persistence. Release 1.0 includes a full re-implementation of the entire pipeline at scale, in Python 3.5, including a set of top-level scripts for document indexing and a container architecture for deployment into production environments.
Related projects: