Session 4: Research design in classification http://hdl.handle.net/10256/642 2025-08-10T20:10:42Z 2025-08-10T20:10:42Z Using self organizing maps on compositional data Cortés, Joaquín A. Palma, José Luis http://hdl.handle.net/10256/740 2022-07-13T06:59:26Z 2008-05-28T00:00:00Z Using self organizing maps on compositional data Cortés, Joaquín A.; Palma, José Luis Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data 2008-05-28T00:00:00Z CODAMAT: a modern analogue techinque for compositional data Martín Fernández, Josep Antoni Di Donato, Valentino http://hdl.handle.net/10256/739 2022-07-13T06:59:26Z 2008-05-28T00:00:00Z CODAMAT: a modern analogue techinque for compositional data Martín Fernández, Josep Antoni; Di Donato, Valentino Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares 2008-05-28T00:00:00Z Analysis of Pleistocene paleodrainage evolution in the Po Basin (Italy) by multivariate statistical techniques Vezzoli, Giovanni http://hdl.handle.net/10256/735 2022-07-13T06:59:26Z 2008-05-28T00:00:00Z Analysis of Pleistocene paleodrainage evolution in the Po Basin (Italy) by multivariate statistical techniques Vezzoli, Giovanni Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage 2008-05-28T00:00:00Z Discovering similarities in the time use patterns of the Spanish Autonomous Communities by Fuzzy techniques Palarea Albaladejo, Javier Martín Fernández, Josep Antoni http://hdl.handle.net/10256/734 2022-07-13T06:59:26Z 2008-05-28T00:00:00Z Discovering similarities in the time use patterns of the Spanish Autonomous Communities by Fuzzy techniques Palarea Albaladejo, Javier; Martín Fernández, Josep Antoni Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni In 2000 the European Statistical Office published the guidelines for developing the Harmonized European Time Use Surveys system. Under such a unified framework, the first Time Use Survey of national scope was conducted in Spain during 2002– 03. The aim of these surveys is to understand human behavior and the lifestyle of people. Time allocation data are of compositional nature in origin, that is, they are subject to non-negativity and constant-sum constraints. Thus, standard multivariate techniques cannot be directly applied to analyze them. The goal of this work is to identify homogeneous Spanish Autonomous Communities with regard to the typical activity pattern of their respective populations. To this end, fuzzy clustering approach is followed. Rather than the hard partitioning of classical clustering, where objects are allocated to only a single group, fuzzy method identify overlapping groups of objects by allowing them to belong to more than one group. Concretely, the probabilistic fuzzy c-means algorithm is conveniently adapted to deal with the Spanish Time Use Survey microdata. As a result, a map distinguishing Autonomous Communities with similar activity pattern is drawn. Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance 2008-05-28T00:00:00Z