Use of Survey Weights for the Analysis of Compositional Data: Some Simulation Results

Graf, Monique
The compositional space can be seen as a vector space, where the vector addition corresponds to perturbation and the multiplication by a scalar corresponds to powering (Aitchison, 1986; PawlowskyGlahn and Egozcue, 2001). Whereas perturbation is a widely used operation in applications of compositional analysis, powering is somewhat neglected. Survey data analysis on the other hand is a domain of applied statistics where the use of weights is predominant. The reason for introducing weights in survey data analysis is threefold: 1. the use of complex survey designs with unequal inclusion probabilities, 2. the correction of non-response, and 3. calibration procedures. We shall introduce briefly the rationale for weights in survey analysis and then discuss the connection between survey weights and the powering operation. Several examples will be given. Surveys are essentially built to optimize the estimation of totals in population subgroups for a number of variables. Practically, a key variable is chosen and the design is optimized for this variable, the trade-off being between cost and precision. Totals are estimated by weighted sums of the sampled values. The weights are extrapolation factors that depend on the survey design. It is an important aspect of the data quality to inform the user on the measurement error of the published figures. Survey design and estimation are described e.g. in S¨arndal, Swensson and Wretman (1992). In a survey context, the interest is taken in totals or means across cases, but in a compositional context, totals have no meaning. So if we want to average cases, we have to go back to the original measurement scale and then make the closure operation. For the geometric mean composition on the contrary, the result is the same, whether the amounts are averaged first and then a average composition is computed, or whether the geometric mean of the compositions is computed directly and then closed. The design-based approach does not make any assumptions on the distribution of compositions. This opens the way to parametrization by general partitions (Aitchison, 1986, section 2.7) without the drawback of ad hoc assumptions on multivariate normality (Aitchison, 1986, definition 6.7). In household expenditure surveys for instance, a hierarchy of commodities with broad categories are subdivided into more detailed goods. A general partition can follow this organization and may be a more convenient way to convey the information on the surveyed units. The joint probability distribution of transforms of this general partition is derived from the distribution of the sample inclusion indicator. After a brief review of survey methodology, we apply the design-based principles to the estimation of compositions, of compositional transforms and of their covariance matrix on a small population. The properties of the estimators will be investigated by simulation. The talk will end with a discussion ​
​Tots els drets reservats