Session 6: Bio and social compositional datahttp://hdl.handle.net/10256/6442016-05-02T08:43:36Z2016-05-02T08:43:36ZScoring Methods for Ordinal Multidimensional Forced-Choice ItemsDe Vries, Anton L.M.Van der Ark, L. Andrieshttp://hdl.handle.net/10256/7442012-06-28T12:30:36Z2008-05-29T00:00:00ZScoring Methods for Ordinal Multidimensional Forced-Choice Items
De Vries, Anton L.M.; Van der Ark, L. Andries
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
In most psychological tests and questionnaires, a test score is obtained by
taking the sum of the item scores. In virtually all cases where the test or
questionnaire contains multidimensional forced-choice items, this traditional
scoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretable
test scores. Therefore, we propose three alternative scoring methods: a weak
and a strict rank preserving scoring method, which both allow an ordinal
interpretation of test scores; and a ratio preserving scoring method, which
allows a proportional interpretation of test scores. Each proposed scoring
method yields an index for each respondent indicating the degree to which
the response pattern is inconsistent. Analysis of real data showed that with
respect to rank preservation, the weak and strict rank preserving method
resulted in lower inconsistency indices than the traditional scoring method;
with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
2008-05-29T00:00:00ZCoherent forecasting of multiple-decrement life tables: a test using Japanese cause of death dataOeppen, Jimhttp://hdl.handle.net/10256/7422012-06-28T12:30:36Z2008-05-29T00:00:00ZCoherent forecasting of multiple-decrement life tables: a test using Japanese cause of death data
Oeppen, Jim
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to
achieve because the relative values of the forecast components often fail to behave in
a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has
been shown that cause-specic mortality forecasts are pessimistic when compared with
all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach
of using log mortality rates and forecasts the density of deaths in the life table. Since
these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing
state), they are intrinsically relative rather than absolute values across decrements as
well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison
(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the
unit sum constraint is honoured. The structure of the best-known, single-decrement
mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in
compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality
by cause of death for Japan
2008-05-29T00:00:00ZCompositional amalgamations and balances: a critical approachMateu i Figueras, GlòriaDaunis i Estadella, Josephttp://hdl.handle.net/10256/7382012-06-28T12:30:36Z2008-05-29T00:00:00ZCompositional amalgamations and balances: a critical approach
Mateu i Figueras, Glòria; Daunis i Estadella, Josep
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
The amalgamation operation is frequently used to reduce the number of parts of compositional data but it is a non-linear operation in the simplex with the usual geometry,
the Aitchison geometry. The concept of balances between groups, a particular coordinate system designed over binary partitions of the parts, could be an alternative to the
amalgamation in some cases. In this work we discuss the proper application of both
concepts using a real data set corresponding to behavioral measures of pregnant sows
2008-05-29T00:00:00ZHardy-Weinberg Equilibrium and the Ternary PlotGraffelman, Janhttp://hdl.handle.net/10256/7372012-06-28T12:30:36Z2008-05-29T00:00:00ZHardy-Weinberg Equilibrium and the Ternary Plot
Graffelman, Jan
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
The Hardy-Weinberg law, formulated about 100 years ago, states that under certain
assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in
the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p.
There are many statistical tests being used to check whether empirical marker data obeys the
Hardy-Weinberg principle. Among these are the classical xi-square test (with or without
continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination
with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE)
are numerical in nature, requiring the computation of a test statistic and a p-value.
There is however, ample space for the use of graphics in HWE tests, in particular for the ternary
plot. Nowadays, many genetical studies are using genetical markers known as Single
Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts
one typically computes genotype frequencies and allele frequencies. These frequencies satisfy
the unit-sum constraint, and their analysis therefore falls within the realm of compositional data
analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype
frequencies can be adequately represented in a ternary plot. Compositions that are in exact
HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in
a statistical test are typically “close" to the parabola, whereas compositions that differ
significantly from HWE are “far". By rewriting the statistics used to test for HWE in terms of
heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in
the ternary plot. This way, compositions can be tested for HWE purely on the basis of their
position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical
representations where large numbers of SNPs can be tested for HWE in a single graph. Several
examples of graphical tests for HWE (implemented in R software), will be shown, using SNP
data from different human populations
2008-05-29T00:00:00Z“Unmixing” Tissue Gene Expression Signatures from Tumor BiopsiesBillheimer, Deanhttp://hdl.handle.net/10256/7362012-06-28T12:30:36Z2008-05-29T00:00:00Z“Unmixing” Tissue Gene Expression Signatures from Tumor Biopsies
Billheimer, Dean
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Emergent molecular measurement methods, such as DNA microarray, qRTPCR, and
many others, offer tremendous promise for the personalized treatment of cancer. These
technologies measure the amount of specific proteins, RNA, DNA or other molecular
targets from tumor specimens with the goal of “fingerprinting” individual cancers. Tumor
specimens are heterogeneous; an individual specimen typically contains unknown
amounts of multiple tissues types. Thus, the measured molecular concentrations result
from an unknown mixture of tissue types, and must be normalized to account for the
composition of the mixture.
For example, a breast tumor biopsy may contain normal, dysplastic and cancerous
epithelial cells, as well as stromal components (fatty and connective tissue) and blood
and lymphatic vessels. Our diagnostic interest focuses solely on the dysplastic and
cancerous epithelial cells. The remaining tissue components serve to “contaminate”
the signal of interest. The proportion of each of the tissue components changes as
a function of patient characteristics (e.g., age), and varies spatially across the tumor
region. Because each of the tissue components produces a different molecular signature,
and the amount of each tissue type is specimen dependent, we must estimate the tissue
composition of the specimen, and adjust the molecular signal for this composition.
Using the idea of a chemical mass balance, we consider the total measured concentrations
to be a weighted sum of the individual tissue signatures, where weights
are determined by the relative amounts of the different tissue types. We develop a
compositional source apportionment model to estimate the relative amounts of tissue
components in a tumor specimen. We then use these estimates to infer the tissuespecific
concentrations of key molecular targets for sub-typing individual tumors. We
anticipate these specific measurements will greatly improve our ability to discriminate
between different classes of tumors, and allow more precise matching of each patient to
the appropriate treatment
2008-05-29T00:00:00Z