On Dimensionality of Latent Semantic Indexing for Text Segmentation

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

ŘEHŮŘEK Radim

Year of publication 2007
Type Article in Periodical
Magazine / Source Proceedings of the International Multiconference on Computer Science and Information Technology
MU Faculty or unit

Faculty of Informatics

Citation
Web http://www.papers2007.imcsit.org/
Field Informatics
Keywords text segmentation; LSI; latent semantic indexing
Description In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.