Statistical inference for Hardy-Weinberg Equilibrium using Log-ratio Coordinates

Graffelman, Jan
Compartir
Testing markers for Hardy-Weinberg equilibrium (HWE) is an important step in the analysis of large databases used in genetic association studies. Gross deviation from HWE can be indicative of genotyping error. There are many approaches to testing markers for HWE. The classical chi-square test was, till recently, the most widely used approach to HWE-testing. Over the last decade, the computationally more demanding exact test has become more popular. Bayesian approaches, where the full posterior distribution of a disequilibrium parameter is obtained, have also been developed. As far as CODA is concerned, Aitchison described how the HWE law can be “discovered” when a set of samples, all genotyped for the same marker, is analyzed by log-ratio principal component analysis. A well-known tool in CODA, the ternary plot, is known in genetics as a de Finetti diagram. The Hardy-Weinberg law defines a parabola in a ternary plot of the three genotypes frequencies of a bi-allelic marker. Ternary plots of bi-allelic genetic markers typically show points that “follow” the parabola, though with certain scatter that depends on the sample size. When represented in additive, centered or isometric log-ratio coordinates, the HW parabola becomes a straight line. Much of CODA is concerned with data sets where each individual row in the data set (an individual, a sample, an object) constitutes a composition. In data sets comprising genetic markers, individual rows (persons) are not really compositions, but it is the total sample of all individuals that constitutes a composition. The CODA approach to genetic data has shown useful in supplying interesting graphics, but to date CODA seems not to have provided formal statistical inference for HWE, probably because the distribution of the log-ratio coordinates is not known. Nevertheless, the log-ratio approach directly suggests some statistics that can be used for measuring disequilibrium: the second clr and the second ilr coordinate of the sample. Similar statistics have been used in the genetics literature. In this contribution, we will use the multivariate delta method to derive the asymptotic distribution of the isometric log-ratio coordinates. This allows hypothesis testing for HWE and the construction of confidence intervals for large samples that contain no zeros. The type 1 error rate of the test is compared with the classical chi-square test ​
​Tots els drets reservats