Rate-Distortion Theory for Clustering in the Perceptual Space
Texto Completo
Compartir
How to extract relevant information from large data sets has become a main challenge
in data visualization. Clustering techniques that classify data into groups according to similarity
metrics are a suitable strategy to tackle this problem. Generally, these techniques are applied in the
data space as an independent step previous to visualization. In this paper, we propose clustering
on the perceptual space by maximizing the mutual information between the original data and the
final visualization. With this purpose, we present a new information-theoretic framework based on
the rate-distortion theory that allows us to achieve a maximally compressed data with a minimal
signal distortion. Using this framework, we propose a methodology to design a visualization process
that minimizes the information loss during the clustering process. Three application examples of the
proposed methodology in different visualization techniques such as scatterplot, parallel coordinates,
and summary trees are presented