Speeding up Continuous kNN Join by Binary Sketches

Logo poskytovatele

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

NÁLEPA Filip BATKO Michal ZEZULA Pavel

Rok publikování 2018
Druh Článek ve sborníku
Konference Advances in Data Mining
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Doi http://dx.doi.org/10.1007/978-3-319-95786-9_14
Klíčová slova continuous kNN similarity join; binary sketches
Popis Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window. We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.