Statistical inference for Hardy-Weinberg Equilibrium using Log-ratio Coordinates
Full Text
Share
Testing markers for Hardy-Weinberg equilibrium (HWE) is an important step in the analysis of
large databases used in genetic association studies. Gross deviation from HWE can be indicative of
genotyping error. There are many approaches to testing markers for HWE. The classical chi-square
test was, till recently, the most widely used approach to HWE-testing. Over the last decade, the
computationally more demanding exact test has become more popular. Bayesian approaches, where
the full posterior distribution of a disequilibrium parameter is obtained, have also been developed.
As far as CODA is concerned, Aitchison described how the HWE law can be “discovered” when
a set of samples, all genotyped for the same marker, is analyzed by log-ratio principal component
analysis. A well-known tool in CODA, the ternary plot, is known in genetics as a de Finetti
diagram. The Hardy-Weinberg law defines a parabola in a ternary plot of the three genotypes
frequencies of a bi-allelic marker. Ternary plots of bi-allelic genetic markers typically show points
that “follow” the parabola, though with certain scatter that depends on the sample size. When
represented in additive, centered or isometric log-ratio coordinates, the HW parabola becomes a
straight line. Much of CODA is concerned with data sets where each individual row in the data
set (an individual, a sample, an object) constitutes a composition. In data sets comprising genetic
markers, individual rows (persons) are not really compositions, but it is the total sample of all
individuals that constitutes a composition. The CODA approach to genetic data has shown useful
in supplying interesting graphics, but to date CODA seems not to have provided formal statistical
inference for HWE, probably because the distribution of the log-ratio coordinates is not known.
Nevertheless, the log-ratio approach directly suggests some statistics that can be used for measuring
disequilibrium: the second clr and the second ilr coordinate of the sample. Similar statistics have
been used in the genetics literature. In this contribution, we will use the multivariate delta method
to derive the asymptotic distribution of the isometric log-ratio coordinates. This allows hypothesis
testing for HWE and the construction of confidence intervals for large samples that contain no
zeros. The type 1 error rate of the test is compared with the classical chi-square test
Tots els drets reservats