CODAWORK’08http://hdl.handle.net/10256/6182017-01-19T01:09:54Z2017-01-19T01:09:54ZExperimental design on the simplexAtkinson, A.C.http://hdl.handle.net/10256/7532013-07-17T09:56:48Z2008-05-30T00:00:00ZExperimental design on the simplex
Atkinson, A.C.
Martín Fernández, Josep Antoni; Daunis i Estadella, Josep
Optimum experimental designs depend on the design criterion, the model and
the design region. The talk will consider the design of experiments for regression
models in which there is a single response with the explanatory variables lying in
a simplex. One example is experiments on various compositions of glass such as
those considered by Martin, Bursnall, and Stillman (2001).
Because of the highly symmetric nature of the simplex, the class of models that
are of interest, typically Scheff´e polynomials (Scheff´e 1958) are rather different
from those of standard regression analysis. The optimum designs are also rather
different, inheriting a high degree of symmetry from the models.
In the talk I will hope to discuss a variety of modes for such experiments. Then
I will discuss constrained mixture experiments, when not all the simplex is available
for experimentation. Other important aspects include mixture experiments
with extra non-mixture factors and the blocking of mixture experiments.
Much of the material is in Chapter 16 of Atkinson, Donev, and Tobias (2007).
If time and my research allows, I would hope to finish with a few comments on
design when the responses, rather than the explanatory variables, lie in a simplex.
References
Atkinson, A. C., A. N. Donev, and R. D. Tobias (2007). Optimum Experimental
Designs, with SAS. Oxford: Oxford University Press.
Martin, R. J., M. C. Bursnall, and E. C. Stillman (2001). Further results on
optimal and efficient designs for constrained mixture experiments. In A. C.
Atkinson, B. Bogacka, and A. Zhigljavsky (Eds.), Optimal Design 2000,
pp. 225–239. Dordrecht: Kluwer.
Scheff´e, H. (1958). Experiments with mixtures. Journal of the Royal Statistical
Society, Ser. B 20, 344–360.
1
2008-05-30T00:00:00ZRobust Factor Analysis for Compositional DataFilzmoser, PeterHron, KarelReimann, ClemensGarrett, Robert G.http://hdl.handle.net/10256/7522012-06-28T12:30:36Z2008-05-30T00:00:00ZRobust Factor Analysis for Compositional Data
Filzmoser, Peter; Hron, Karel; Reimann, Clemens; Garrett, Robert G.
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)
transformation to obtain the random vector y of dimension D. The factor model is
then
y = Λf + e (1)
with the factors f of dimension k < D, the error term e, and the loadings matrix Λ.
Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis
model (1) can be written as
Cov(y) = ΛΛT + ψ (2)
where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the
loadings matrix Λ are estimated from an estimation of Cov(y).
Given observed clr transformed data Y as realizations of the random vector
y. Outliers or deviations from the idealized model assumptions of factor analysis
can severely effect the parameter estimation. As a way out, robust estimation of
the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see
Pison et al. (2003). Well known robust covariance estimators with good statistical
properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely
on a full-rank data matrix Y which is not the case for clr transformed data (see,
e.g., Aitchison, 1986).
The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this
singularity problem. The data matrix Y is transformed to a matrix Z by using
an orthonormal basis of lower dimension. Using the ilr transformed data, a robust
covariance matrix C(Z) can be estimated. The result can be back-transformed to
the clr space by
C(Y ) = V C(Z)V T
where the matrix V with orthonormal columns comes from the relation between
the clr and the ilr transformation. Now the parameters in the model (2) can be
estimated (Basilevsky, 1994) and the results have a direct interpretation since the
links to the original variables are still preserved.
The above procedure will be applied to data from geochemistry. Our special
interest is on comparing the results with those of Reimann et al. (2002) for the Kola
project data
2008-05-30T00:00:00ZVertebrates Limb Geometry in the Simplex spaceDaunis i Estadella, JosepMateu i Figueras, GlòriaThió i Fernández de Henestrosa, SantiagoRodrigues, L.http://hdl.handle.net/10256/7512012-06-28T12:30:36Z2008-05-30T00:00:00ZVertebrates Limb Geometry in the Simplex space
Daunis i Estadella, Josep; Mateu i Figueras, Glòria; Thió i Fernández de Henestrosa, Santiago; Rodrigues, L.
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
A novel metric comparison of the appendicular skeleton (fore and hind limb) of
different vertebrates using the Compositional Data Analysis (CDA) methodological
approach it’s presented.
355 specimens belonging in various taxa of Dinosauria (Sauropodomorpha, Theropoda,
Ornithischia and Aves) and Mammalia (Prothotheria, Metatheria and Eutheria) were
analyzed with CDA.
A special focus has been put on Sauropodomorpha dinosaurs and the Aitchinson
distance has been used as a measure of disparity in limb elements proportions to infer
some aspects of functional morphology
2008-05-30T00:00:00ZModelling of Mercury’s surface composition and remote detection from the orbit with the BepiColombo Mercury Planetary OrbiterLammer, HelmutWurz, PeterMartín Fernández, Josep AntoniLichtenegger, Herbert I.M.Khodachenko, Maxim L.http://hdl.handle.net/10256/7502012-11-19T08:56:48Z2008-05-30T00:00:00ZModelling of Mercury’s surface composition and remote detection from the orbit with the BepiColombo Mercury Planetary Orbiter
Lammer, Helmut; Wurz, Peter; Martín Fernández, Josep Antoni; Lichtenegger, Herbert I.M.; Khodachenko, Maxim L.
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
It can be assumed that the composition of Mercury’s thin gas envelope (exosphere) is related to the
composition of the planets crustal materials. If this relationship is true, then inferences regarding the bulk
chemistry of the planet might be made from a thorough exospheric study. The most vexing of all
unsolved problems is the uncertainty in the source of each component. Historically, it has been believed
that H and He come primarily from the solar wind, while Na and K originate from volatilized materials
partitioned between Mercury’s crust and meteoritic impactors. The processes that eject atoms and
molecules into the exosphere of Mercury are generally considered to be thermal vaporization, photonstimulated
desorption (PSD), impact vaporization, and ion sputtering. Each of these processes has its own
temporal and spatial dependence. The exosphere is strongly influenced by Mercury’s highly elliptical
orbit and rapid orbital speed. As a consequence the surface undergoes large fluctuations in temperature
and experiences differences of insolation with longitude. We will discuss these processes but focus more
on the expected surface composition and solar wind particle sputtering which releases material like Ca
and other elements from the surface minerals and discuss the relevance of composition modelling
2008-05-30T00:00:00ZRevisiting the compositional data. Some fundamental questions and new prospects in Archaeometry and ArchaeologyBuxeda i Garrigós, Jaumehttp://hdl.handle.net/10256/7492012-06-28T12:30:36Z2008-05-30T00:00:00ZRevisiting the compositional data. Some fundamental questions and new prospects in Archaeometry and Archaeology
Buxeda i Garrigós, Jaume
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
In this paper we examine the problem of compositional data from a different starting
point. Chemical compositional data, as used in provenance studies on archaeological
materials, will be approached from the measurement theory. The results will show, in a
very intuitive way that chemical data can only be treated by using the approach
developed for compositional data. It will be shown that compositional data analysis is a
particular case in projective geometry, when the projective coordinates are in the
positive orthant, and they have the properties of logarithmic interval metrics. Moreover,
it will be shown that this approach can be extended to a very large number of
applications, including shape analysis. This will be exemplified with a case study in
architecture of Early Christian churches dated back to the 5th-7th centuries AD
2008-05-30T00:00:00ZStatistical treatment of grain-size curves and empirical distributions: densities as compositions?Tolosana Delgado, RaimonBoogaart, K. Gerald van denMikes, TündeEynatten, Hilmar vonhttp://hdl.handle.net/10256/7482012-06-28T12:30:36Z2008-05-30T00:00:00ZStatistical treatment of grain-size curves and empirical distributions: densities as compositions?
Tolosana Delgado, Raimon; Boogaart, K. Gerald van den; Mikes, Tünde; Eynatten, Hilmar von
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
The preceding two editions of CoDaWork included talks on the possible consideration
of densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended the
Euclidean structure of the simplex to a Hilbert space structure of the set of densities
within a bounded interval, and van den Boogaart (2005) generalized this to the set
of densities bounded by an arbitrary reference density. From the many variations of
the Hilbert structures available, we work with three cases. For bounded variables, a
basis derived from Legendre polynomials is used. For variables with a lower bound, we
standardize them with respect to an exponential distribution and express their densities
as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded
variables, a normal distribution is used as reference, and coordinates are obtained with
respect to a Hermite-polynomials-based basis.
To get the coordinates, several approaches can be considered. A numerical accuracy
problem occurs if one estimates the coordinates directly by using discretized scalar
products. Thus we propose to use a weighted linear regression approach, where all k-
order polynomials are used as predictand variables and weights are proportional to the
reference density. Finally, for the case of 2-order Hermite polinomials (normal reference)
and 1-order Laguerre polinomials (exponential), one can also derive the coordinates
from their relationships to the classical mean and variance.
Apart of these theoretical issues, this contribution focuses on the application of this
theory to two main problems in sedimentary geology: the comparison of several grain
size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or
sediment, like their composition
2008-05-30T00:00:00ZDynamic graphics of parametrically linked multivariate methods used in compositional data analysisGreenacre, Michael J.http://hdl.handle.net/10256/7472012-06-28T12:30:36Z2008-05-30T00:00:00ZDynamic graphics of parametrically linked multivariate methods used in compositional data analysis
Greenacre, Michael J.
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Many multivariate methods that are apparently distinct can be linked by introducing one
or more parameters in their definition. Methods that can be linked in this way are
correspondence analysis, unweighted or weighted logratio analysis (the latter also
known as "spectral mapping"), nonsymmetric correspondence analysis, principal
component analysis (with and without logarithmic transformation of the data) and
multidimensional scaling. In this presentation I will show how several of these
methods, which are frequently used in compositional data analysis, may be linked
through parametrizations such as power transformations, linear transformations and
convex linear combinations. Since the methods of interest here all lead to visual maps
of data, a "movie" can be made where where the linking parameter is allowed to vary in
small steps: the results are recalculated "frame by frame" and one can see the smooth
change from one method to another. Several of these "movies" will be shown, giving a
deeper insight into the similarities and differences between these methods
2008-05-30T00:00:00ZComparing methods for dimensionality reduction when data are density functionsDelicado, Pedrohttp://hdl.handle.net/10256/7462012-06-28T12:30:36Z2008-05-30T00:00:00ZComparing methods for dimensionality reduction when data are density functions
Delicado, Pedro
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Functional Data Analysis (FDA) deals with samples where a whole function is observed
for each individual. A particular case of FDA is when the observed functions are density
functions, that are also an example of infinite dimensional compositional data. In this
work we compare several methods for dimensionality reduction for this particular type
of data: functional principal components analysis (PCA) with or without a previous
data transformation and multidimensional scaling (MDS) for diferent inter-densities
distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households
income distributions)
2008-05-30T00:00:00ZClustering compositional data trajectoriesBruno, FrancescaGreco, Fedelehttp://hdl.handle.net/10256/7452012-11-27T10:55:56Z2008-05-30T00:00:00ZClustering compositional data trajectories
Bruno, Francesca; Greco, Fedele
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
Our essay aims at studying suitable statistical methods for the clustering of
compositional data in situations where observations are constituted by trajectories of
compositional data, that is, by sequences of composition measurements along a domain.
Observed trajectories are known as “functional data” and several methods have been
proposed for their analysis.
In particular, methods for clustering functional data, known as Functional Cluster
Analysis (FCA), have been applied by practitioners and scientists in many fields. To our
knowledge, FCA techniques have not been extended to cope with the problem of
clustering compositional data trajectories. In order to extend FCA techniques to the
analysis of compositional data, FCA clustering techniques have to be adapted by using a
suitable compositional algebra.
The present work centres on the following question: given a sample of compositional
data trajectories, how can we formulate a segmentation procedure giving homogeneous
classes? To address this problem we follow the steps described below.
First of all we adapt the well-known spline smoothing techniques in order to cope with
the smoothing of compositional data trajectories. In fact, an observed curve can be
thought of as the sum of a smooth part plus some noise due to measurement errors.
Spline smoothing techniques are used to isolate the smooth part of the trajectory:
clustering algorithms are then applied to these smooth curves.
The second step consists in building suitable metrics for measuring the dissimilarity
between trajectories: we propose a metric that accounts for difference in both shape and
level, and a metric accounting for differences in shape only.
A simulation study is performed in order to evaluate the proposed methodologies, using
both hierarchical and partitional clustering algorithm. The quality of the obtained results
is assessed by means of several indices
2008-05-30T00:00:00ZScoring Methods for Ordinal Multidimensional Forced-Choice ItemsDe Vries, Anton L.M.Van der Ark, L. Andrieshttp://hdl.handle.net/10256/7442012-06-28T12:30:36Z2008-05-29T00:00:00ZScoring Methods for Ordinal Multidimensional Forced-Choice Items
De Vries, Anton L.M.; Van der Ark, L. Andries
Daunis i Estadella, Josep; Martín Fernández, Josep Antoni
In most psychological tests and questionnaires, a test score is obtained by
taking the sum of the item scores. In virtually all cases where the test or
questionnaire contains multidimensional forced-choice items, this traditional
scoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretable
test scores. Therefore, we propose three alternative scoring methods: a weak
and a strict rank preserving scoring method, which both allow an ordinal
interpretation of test scores; and a ratio preserving scoring method, which
allows a proportional interpretation of test scores. Each proposed scoring
method yields an index for each respondent indicating the degree to which
the response pattern is inconsistent. Analysis of real data showed that with
respect to rank preservation, the weak and strict rank preserving method
resulted in lower inconsistency indices than the traditional scoring method;
with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
2008-05-29T00:00:00Z