Compositional Data Analysis in E-Tourism Research

Compartir
Compositional Data (CoDa) contain information about the relative importance of parts of a whole, which the researcher deems more interesting than overall size or volume. In web mining, for instance, the relative frequency of a term is normally given more importance than absolute frequency, which mostly tells about web size, in other words, the sheer volume of online content. Many research questions in e-tourism are either related to the distribution of a whole or relative importance: How do the most salient contents in hotel Facebook accounts relate to hotel characteristics? What are the dominant topics on TripAdvisor comments about fish freshness in seafood restaurants? How does the relative popularity of search terms in Google relate to destination market share? In CoDa, most of the basic statistical notions, such as center, variation, association, and distance, are flawed unless they are re-expressed by means of logarithms of ratios. The appeal of log-ratios is that once they are computed, standard statistical methods can be used. On the other hand, since one part can only increase in relative terms if some other(s) decrease, statistics need to be multivariate. This chapter uses an example based on TripAdvisor hotel reviews from one of the most visited cities worldwide, Barcelona, focusing on what users complain about, to illustrate the main multivariate exploratory and descriptive tools in CoDa, including imputation of zeros prior to computing the log-ratios, multivariate outlier detection, principal component analysis, cluster analysis, and multivariate data visualization tools. The use of CoDaPack, a popular CoDa freeware, is described in a step-by-step fashion ​
​Tots els drets reservats