<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.w3.org/2005/Atom">
<title>Session 2: Zero replacement strategies</title>
<link href="http://hdl.handle.net/10256/623" rel="alternate"/>
<subtitle/>
<id>http://hdl.handle.net/10256/623</id>
<updated>2013-05-23T04:20:56Z</updated>
<dc:date>2013-05-23T04:20:56Z</dc:date>
<entry>
<title>Markov chain montecarlo method applied to rounding zeros of compositional data: first approach</title>
<link href="http://hdl.handle.net/10256/663" rel="alternate"/>
<author>
<name>Martín Fernández, Josep Antoni</name>
</author>
<author>
<name>Palarea Albaladejo, Javier</name>
</author>
<author>
<name>Gómez García, Juan</name>
</author>
<id>http://hdl.handle.net/10256/663</id>
<updated>2012-11-19T08:56:33Z</updated>
<published>2003-10-15T00:00:00Z</published>
<summary type="text">Markov chain montecarlo method applied to rounding zeros of compositional data: first approach
Martín Fernández, Josep Antoni; Palarea Albaladejo, Javier; Gómez García, Juan
Martín Fernández, Josep Antoni; Thió i Fernández de Henestrosa, Santiago
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely&#13;
absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by&#13;
Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved&#13;
parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is&#13;
introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the&#13;
theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach&#13;
has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that&#13;
it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the&#13;
same paper a substitution method for missing values on compositional data sets is introduced
</summary>
<dc:date>2003-10-15T00:00:00Z</dc:date>
</entry>
<entry>
<title>Modelling structural zeros in compositional data</title>
<link href="http://hdl.handle.net/10256/661" rel="alternate"/>
<author>
<name>Bacon Shone, John</name>
</author>
<id>http://hdl.handle.net/10256/661</id>
<updated>2012-06-28T12:30:36Z</updated>
<published>2003-10-15T00:00:00Z</published>
<summary type="text">Modelling structural zeros in compositional data
Bacon Shone, John
Thió i Fernández de Henestrosa, Santiago; Martín Fernández, Josep Antoni
This analysis was stimulated by the real data analysis problem of household&#13;
expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try&#13;
to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending&#13;
excluding alcohol/tobacco similar for teetotal and non-teetotal households?&#13;
In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one&#13;
component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables&#13;
within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.&#13;
While this analysis is based on around economic data, the ideas carry over to&#13;
many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables); Geologische Vereinigung; Universitat de Barcelona, Equip de Recerca Arqueomètrica; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Patronat de l’Escola Politècnica Superior de la Universitat de Girona; Fundació privada: Girona, Universitat i Futur
</summary>
<dc:date>2003-10-15T00:00:00Z</dc:date>
</entry>
<entry>
<title>Possible solution of some essential zero problems in compositional data analysis</title>
<link href="http://hdl.handle.net/10256/652" rel="alternate"/>
<author>
<name>Aitchison, John</name>
</author>
<author>
<name>Kay, Jim W.</name>
</author>
<id>http://hdl.handle.net/10256/652</id>
<updated>2012-11-28T09:19:12Z</updated>
<published>2003-10-15T00:00:00Z</published>
<summary type="text">Possible solution of some essential zero problems in compositional data analysis
Aitchison, John; Kay, Jim W.
Thió i Fernández de Henestrosa, Santiago; Martín Fernández, Josep Antoni
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By an&#13;
essential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur in&#13;
many compositional situations, such as household budget patterns, time budgets,&#13;
palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful in&#13;
such situations. From consideration of such examples it seems sensible to build up a&#13;
model in two stages, the first determining where the zeros will occur and the second&#13;
how the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
</summary>
<dc:date>2003-10-15T00:00:00Z</dc:date>
</entry>
</feed>
