Session 7: On functionshttp://hdl.handle.net/10256/6452025-08-01T03:52:05Z2025-08-01T03:52:05ZStatistical treatment of grain-size curves and empirical distributions: densities as compositions?Tolosana Delgado, RaimonBoogaart, K. Gerald van denMikes, TündeEynatten, Hilmar vonhttp://hdl.handle.net/10256/7482022-07-13T06:59:26Z2008-05-30T00:00:00ZStatistical treatment of grain-size curves and empirical distributions: densities as compositions?
Tolosana Delgado, Raimon; Boogaart, K. Gerald van den; Mikes, Tünde; Eynatten, Hilmar von
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
The preceding two editions of CoDaWork included talks on the possible consideration
of densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended the
Euclidean structure of the simplex to a Hilbert space structure of the set of densities
within a bounded interval, and van den Boogaart (2005) generalized this to the set
of densities bounded by an arbitrary reference density. From the many variations of
the Hilbert structures available, we work with three cases. For bounded variables, a
basis derived from Legendre polynomials is used. For variables with a lower bound, we
standardize them with respect to an exponential distribution and express their densities
as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded
variables, a normal distribution is used as reference, and coordinates are obtained with
respect to a Hermite-polynomials-based basis.
To get the coordinates, several approaches can be considered. A numerical accuracy
problem occurs if one estimates the coordinates directly by using discretized scalar
products. Thus we propose to use a weighted linear regression approach, where all k-
order polynomials are used as predictand variables and weights are proportional to the
reference density. Finally, for the case of 2-order Hermite polinomials (normal reference)
and 1-order Laguerre polinomials (exponential), one can also derive the coordinates
from their relationships to the classical mean and variance.
Apart of these theoretical issues, this contribution focuses on the application of this
theory to two main problems in sedimentary geology: the comparison of several grain
size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or
sediment, like their composition
2008-05-30T00:00:00ZDynamic graphics of parametrically linked multivariate methods used in compositional data analysisGreenacre, Michael J.http://hdl.handle.net/10256/7472022-07-13T06:59:26Z2008-05-30T00:00:00ZDynamic graphics of parametrically linked multivariate methods used in compositional data analysis
Greenacre, Michael J.
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
Many multivariate methods that are apparently distinct can be linked by introducing one
or more parameters in their definition. Methods that can be linked in this way are
correspondence analysis, unweighted or weighted logratio analysis (the latter also
known as "spectral mapping"), nonsymmetric correspondence analysis, principal
component analysis (with and without logarithmic transformation of the data) and
multidimensional scaling. In this presentation I will show how several of these
methods, which are frequently used in compositional data analysis, may be linked
through parametrizations such as power transformations, linear transformations and
convex linear combinations. Since the methods of interest here all lead to visual maps
of data, a "movie" can be made where where the linking parameter is allowed to vary in
small steps: the results are recalculated "frame by frame" and one can see the smooth
change from one method to another. Several of these "movies" will be shown, giving a
deeper insight into the similarities and differences between these methods
2008-05-30T00:00:00ZComparing methods for dimensionality reduction when data are density functionsDelicado, Pedrohttp://hdl.handle.net/10256/7462022-07-13T06:59:26Z2008-05-30T00:00:00ZComparing methods for dimensionality reduction when data are density functions
Delicado, Pedro
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
Functional Data Analysis (FDA) deals with samples where a whole function is observed
for each individual. A particular case of FDA is when the observed functions are density
functions, that are also an example of infinite dimensional compositional data. In this
work we compare several methods for dimensionality reduction for this particular type
of data: functional principal components analysis (PCA) with or without a previous
data transformation and multidimensional scaling (MDS) for diferent inter-densities
distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households
income distributions)
2008-05-30T00:00:00ZClustering compositional data trajectoriesBruno, FrancescaGreco, Fedelehttp://hdl.handle.net/10256/7452022-07-13T06:59:26Z2008-05-30T00:00:00ZClustering compositional data trajectories
Bruno, Francesca; Greco, Fedele
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
Our essay aims at studying suitable statistical methods for the clustering of
compositional data in situations where observations are constituted by trajectories of
compositional data, that is, by sequences of composition measurements along a domain.
Observed trajectories are known as “functional data” and several methods have been
proposed for their analysis.
In particular, methods for clustering functional data, known as Functional Cluster
Analysis (FCA), have been applied by practitioners and scientists in many fields. To our
knowledge, FCA techniques have not been extended to cope with the problem of
clustering compositional data trajectories. In order to extend FCA techniques to the
analysis of compositional data, FCA clustering techniques have to be adapted by using a
suitable compositional algebra.
The present work centres on the following question: given a sample of compositional
data trajectories, how can we formulate a segmentation procedure giving homogeneous
classes? To address this problem we follow the steps described below.
First of all we adapt the well-known spline smoothing techniques in order to cope with
the smoothing of compositional data trajectories. In fact, an observed curve can be
thought of as the sum of a smooth part plus some noise due to measurement errors.
Spline smoothing techniques are used to isolate the smooth part of the trajectory:
clustering algorithms are then applied to these smooth curves.
The second step consists in building suitable metrics for measuring the dissimilarity
between trajectories: we propose a metric that accounts for difference in both shape and
level, and a metric accounting for differences in shape only.
A simulation study is performed in order to evaluate the proposed methodologies, using
both hierarchical and partitional clustering algorithm. The quality of the obtained results
is assessed by means of several indices
2008-05-30T00:00:00Z