Avenços en els fonaments matemàtics de l’anàlisi composicional de dades: convexitat i normes Lp. Aplicació a la regressió lineal LASSO amb covariable composicional
Text Complet
Compartir
ENG- Compositional data are a special type of multivariate data where the variables represent parts of a whole. These data are commonly encountered in fields such as geology, biology, economics, and chemistry, where the proportions between components are more informative than the absolute values. A classic example is the chemical composition of a rock, where the percentages of each element sum to 100%. Another example is the composition of a diet, where the proportions of different nutrients (proteins, carbohydrates, fats, etc.) add up to 100%.
A key aspect of compositional data is that the valuable information lies not in the absolute values of the parts, but in the relative relationships between them. For instance, the relative proportion of one element compared to another may be more significant than their individual values. This relative nature means that traditional statistical methods, if applied directly to compositional data, can lead to misleading or inconsistent conclusions. A common issue is the identification of spurious correlations, which arise from the constant sum constraint inherent in compositional data, rather than reflecting any true relationship between the variables.
To address these challenges, Aitchison geometry is used— a mathematical framework specifically designed for analyzing compositional data. This geometry introduces techniques such as the log-ratio transformation, which maps compositional data into a Euclidean space where conventional statistical methods can be applied in a coherent way. This ensures proper handling of the relative information in the data, preserving its consistency and preventing misinterpretation.
In this thesis, a coherent framework for convex optimization within Aitchison geometry is established by adapting the definitions of convexity and Lp norms to maintain the compositional structure of the data. The methodological section includes a detailed comparison of LASSO regression models with different penalty norms, analyzing how the regularization process affects the subcompositional structure of the linear model.
In summary, this thesis advances the methodological tools available for analyzing compositional data, enhancing their applicability across a range of scientific disciplines, including geology, molecular biology, economics, and chemistry
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/