Session 4: Research design in classification
http://hdl.handle.net/10256/642
2025-08-10T20:08:12ZUsing self organizing maps on compositional data
http://hdl.handle.net/10256/740
Using self organizing maps on compositional data
Cortés, Joaquín A.; Palma, José Luis
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed
to explore patterns in high-dimensional multivariate data. The conventional version
of the algorithm involves the use of Euclidean metric in the process of adaptation of
the model vectors, thus rendering in theory a whole methodology incompatible with
non-Euclidean geometries.
In this contribution we explore the two main aspects of the problem:
1. Whether the conventional approach using Euclidean metric can shed valid results
with compositional data.
2. If a modification of the conventional approach replacing vectorial sum and scalar
multiplication by the canonical operators in the simplex (i.e. perturbation and
powering) can converge to an adequate solution.
Preliminary tests showed that both methodologies can be used on compositional data.
However, the modified version of the algorithm performs poorer than the conventional
version, in particular, when the data is pathological. Moreover, the conventional ap-
proach converges faster to a solution, when data is \well-behaved".
Key words: Self Organizing Map; Artificial Neural networks; Compositional data
2008-05-28T00:00:00ZCODAMAT: a modern analogue techinque for compositional data
http://hdl.handle.net/10256/739
CODAMAT: a modern analogue techinque for compositional data
Martín Fernández, Josep Antoni; Di Donato, Valentino
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a
fundamental issue in palaeoclimatic and paleooceanographic investigations. The
Modern Analogue Technique, a widely adopted method based on direct comparison of
fossil assemblages with modern coretop samples, was revised with the aim of
conforming it to compositional data analysis. The new CODAMAT method was
developed by adopting the Aitchison metric as distance measure. Modern coretop
datasets are characterised by a large amount of zeros. The zero replacement was carried
out by adopting a Bayesian approach to the zero replacement, based on a posterior
estimation of the parameter of the multinomial distribution. The number of modern
analogues from which reconstructing the SST was determined by means of a multiple
approach by considering the Proxies correlation matrix, Standardized Residual Sum of
Squares and Mean Squared Distance. This new CODAMAT method was applied to the
planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.
Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,
Standardized Residual Sum of Squares
2008-05-28T00:00:00ZAnalysis of Pleistocene paleodrainage evolution in the Po Basin (Italy) by multivariate statistical techniques
http://hdl.handle.net/10256/735
Analysis of Pleistocene paleodrainage evolution in the Po Basin (Italy) by multivariate statistical techniques
Vezzoli, Giovanni
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously
cored boreholes, 100 to 220m deep were drilled in the northern part of the Po
Plain by Regione Lombardia in the last five years. Quantitative provenance
analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried
out by using multivariate statistical analysis (principal component analysis, PCA,
and similarity analysis) on an integrated data set, including high-resolution bulk
petrography and heavy-mineral analyses on Pleistocene sands and of 250 major
and minor modern rivers draining the southern flank of the Alps from West to
East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations,
metamorphic and quartzofeldspathic detritus from the Western and Central Alps
was carried from the axial belt to the Po basin longitudinally parallel to the
SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario
rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of
the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and
similarity analysis from core samples show that the longitudinal trunk river at this
time was shifted southward by the rapid southward and westward progradation of
transverse alluvial river systems fed from the Central and Southern Alps.
Sediments were transported southward by braided river systems as well as glacial
sediments transported by Alpine valley glaciers invaded the alluvial plain.
Kew words: Detrital modes; Modern sands; Provenance; Principal Components
Analysis; Similarity, Canberra Distance; palaeodrainage
2008-05-28T00:00:00ZDiscovering similarities in the time use patterns of the Spanish Autonomous Communities by Fuzzy techniques
http://hdl.handle.net/10256/734
Discovering similarities in the time use patterns of the Spanish Autonomous Communities by Fuzzy techniques
Palarea Albaladejo, Javier; Martín Fernández, Josep Antoni
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
In 2000 the European Statistical Office published the guidelines for developing the
Harmonized European Time Use Surveys system. Under such a unified framework,
the first Time Use Survey of national scope was conducted in Spain during 2002–
03. The aim of these surveys is to understand human behavior and the lifestyle of
people. Time allocation data are of compositional nature in origin, that is, they are
subject to non-negativity and constant-sum constraints. Thus, standard multivariate
techniques cannot be directly applied to analyze them. The goal of this work is to
identify homogeneous Spanish Autonomous Communities with regard to the typical
activity pattern of their respective populations. To this end, fuzzy clustering approach
is followed. Rather than the hard partitioning of classical clustering, where objects are
allocated to only a single group, fuzzy method identify overlapping groups of objects
by allowing them to belong to more than one group. Concretely, the probabilistic fuzzy
c-means algorithm is conveniently adapted to deal with the Spanish Time Use Survey
microdata. As a result, a map distinguishing Autonomous Communities with similar
activity pattern is drawn.
Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
2008-05-28T00:00:00Z