On Dimensionality of Latent Semantic Indexing for Text Segmentation
Název česky | K dimenzionalitě Lantentního Sémantického Indexování pro segmentaci textu |
---|---|
Autoři | |
Rok publikování | 2007 |
Druh | Článek v odborném periodiku |
Časopis / Zdroj | Proceedings of the International Multiconference on Computer Science and Information Technology |
Fakulta / Pracoviště MU | |
Citace | |
www | http://www.papers2007.imcsit.org/ |
Obor | Informatika |
Klíčová slova | text segmentation; LSI; latent semantic indexing |
Popis | In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set. |
Související projekty: |