Session 0: About zeroes
http://hdl.handle.net/10256/638
2025-08-09T07:41:42ZBayesian tools for zero counts in compositional data
http://hdl.handle.net/10256/713
Bayesian tools for zero counts in compositional data
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni; Palarea Albaladejo, Javier
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
The log-ratio methodology makes available powerful tools for analyzing compositional
data. Nevertheless, the use of this methodology is only possible for those data sets
without null values. Consequently, in those data sets where the zeros are present, a
previous treatment becomes necessary. Last advances in the treatment of compositional
zeros have been centered especially in the zeros of structural nature and in the rounded
zeros. These tools do not contemplate the particular case of count compositional data
sets with null values. In this work we deal with \count zeros" and we introduce a
treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet
probability distribution as a prior and we estimate the posterior probabilities. Then we
apply a multiplicative modi¯cation for the non-zero values. We present a case study
where this new methodology is applied.
Key words: count data, multiplicative replacement, composition, log-ratio analysis
2008-05-27T00:00:00ZDiscrete and continuous compositions
http://hdl.handle.net/10256/712
Discrete and continuous compositions
Bacon Shone, John
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
This paper examines a dataset which is modeled well by the
Poisson-Log Normal process and by this process mixed with Log
Normal data, which are both turned into compositions. This
generates compositional data that has zeros without any need for
conditional models or assuming that there is missing or censored
data that needs adjustment. It also enables us to model dependence
on covariates and within the composition
2008-05-27T00:00:00ZWhen zero doesn't mean it and other geomathematical mischief
http://hdl.handle.net/10256/711
When zero doesn't mean it and other geomathematical mischief
Valls Alvarez, Ricardo.A.
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
There is almost not a case in exploration geology, where the studied data doesn’t
includes below detection limits and/or zero values, and since most of the geological data
responds to lognormal distributions, these “zero data” represent a mathematical
challenge for the interpretation.
We need to start by recognizing that there are zero values in geology. For example the
amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists
with nepheline. Another common essential zero is a North azimuth, however we can
always change that zero for the value of 360°. These are known as “Essential zeros”, but
what can we do with “Rounded zeros” that are the result of below the detection limit of
the equipment?
Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes
we need to differentiate between a sodic and a potassic alteration. Pre-classification into
groups requires a good knowledge of the distribution of the data and the geochemical
characteristics of the groups which is not always available. Considering the zero values
equal to the limit of detection of the used equipment will generate spurious
distributions, especially in ternary diagrams. Same situation will occur if we replace the
zero values by a small amount using non-parametric or parametric techniques
(imputation).
The method that we are proposing takes into consideration the well known relationships
between some elements. For example, in copper porphyry deposits, there is always a
good direct correlation between the copper values and the molybdenum ones, but while
copper will always be above the limit of detection, many of the molybdenum values will
be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values
and establish a regression equation with copper, and then we will estimate the
“rounded” zero values of molybdenum by their corresponding copper values.
The method could be applied to any type of data, provided we establish first their
correlation dependency.
One of the main advantages of this method is that we do not obtain a fixed value for the
“rounded zeros”, but one that depends on the value of the other variable.
Key words: compositional data analysis, treatment of zeros, essential zeros, rounded
zeros, correlation dependency
2008-05-27T00:00:00ZInference of distributional parameters from compositional samples containing nondetects
http://hdl.handle.net/10256/708
Inference of distributional parameters from compositional samples containing nondetects
Olea, Ricardo A.
Daunis-i-Estadella, Pepus; Martín Fernández, Josep Antoni
Low concentrations of elements in geochemical analyses have the peculiarity of being
compositional data and, for a given level of significance, are likely to be beyond the
capabilities of laboratories to distinguish between minute concentrations and complete
absence, thus preventing laboratories from reporting extremely low concentrations of the
analyte. Instead, what is reported is the detection limit, which is the minimum
concentration that conclusively differentiates between presence and absence of the
element. A spatially distributed exhaustive sample is employed in this study to generate
unbiased sub-samples, which are further censored to observe the effect that different
detection limits and sample sizes have on the inference of population distributions
starting from geochemical analyses having specimens below detection limit (nondetects).
The isometric logratio transformation is used to convert the compositional data in the
simplex to samples in real space, thus allowing the practitioner to properly borrow from
the large source of statistical techniques valid only in real space. The bootstrap method is
used to numerically investigate the reliability of inferring several distributional
parameters employing different forms of imputation for the censored data. The case
study illustrates that, in general, best results are obtained when imputations are made
using the distribution best fitting the readings above detection limit and exposes the
problems of other more widely used practices. When the sample is spatially correlated, it
is necessary to combine the bootstrap with stochastic simulation
2008-05-27T00:00:00Z