Session 7: On functions http://hdl.handle.net/10256/645 Fri, 20 Jun 2025 05:33:38 GMT 2025-06-20T05:33:38Z Statistical treatment of grain-size curves and empirical distributions: densities as compositions? http://hdl.handle.net/10256/748 Statistical treatment of grain-size curves and empirical distributions: densities as compositions? Tolosana Delgado, Raimon; Boogaart, K. Gerald van den; Mikes, Tünde; Eynatten, Hilmar von Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni The preceding two editions of CoDaWork included talks on the possible consideration of densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended the Euclidean structure of the simplex to a Hilbert space structure of the set of densities within a bounded interval, and van den Boogaart (2005) generalized this to the set of densities bounded by an arbitrary reference density. From the many variations of the Hilbert structures available, we work with three cases. For bounded variables, a basis derived from Legendre polynomials is used. For variables with a lower bound, we standardize them with respect to an exponential distribution and express their densities as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded variables, a normal distribution is used as reference, and coordinates are obtained with respect to a Hermite-polynomials-based basis. To get the coordinates, several approaches can be considered. A numerical accuracy problem occurs if one estimates the coordinates directly by using discretized scalar products. Thus we propose to use a weighted linear regression approach, where all k- order polynomials are used as predictand variables and weights are proportional to the reference density. Finally, for the case of 2-order Hermite polinomials (normal reference) and 1-order Laguerre polinomials (exponential), one can also derive the coordinates from their relationships to the classical mean and variance. Apart of these theoretical issues, this contribution focuses on the application of this theory to two main problems in sedimentary geology: the comparison of several grain size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or sediment, like their composition Fri, 30 May 2008 00:00:00 GMT http://hdl.handle.net/10256/748 2008-05-30T00:00:00Z Dynamic graphics of parametrically linked multivariate methods used in compositional data analysis http://hdl.handle.net/10256/747 Dynamic graphics of parametrically linked multivariate methods used in compositional data analysis Greenacre, Michael J. Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods Fri, 30 May 2008 00:00:00 GMT http://hdl.handle.net/10256/747 2008-05-30T00:00:00Z Comparing methods for dimensionality reduction when data are density functions http://hdl.handle.net/10256/746 Comparing methods for dimensionality reduction when data are density functions Delicado, Pedro Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions) Fri, 30 May 2008 00:00:00 GMT http://hdl.handle.net/10256/746 2008-05-30T00:00:00Z Clustering compositional data trajectories http://hdl.handle.net/10256/745 Clustering compositional data trajectories Bruno, Francesca; Greco, Fedele Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices Fri, 30 May 2008 00:00:00 GMT http://hdl.handle.net/10256/745 2008-05-30T00:00:00Z