Metric hull as similarity-aware operator for representing unstructured data

Investor logo

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

ANTOL Matej JÁNOŠOVÁ Miriama DOHNAL Vlastislav

Year of publication 2021
Type Article in Periodical
Magazine / Source Pattern Recognition Letters
MU Faculty or unit

Faculty of Informatics

Citation
Web https://www.sciencedirect.com/science/article/pii/S0167865521001914
Doi http://dx.doi.org/10.1016/j.patrec.2021.05.011
Keywords Similarity operators; Metric space; Data aggregation
Description Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.