Parameter validation for a genomics population analysis
Text Complet
Compartir
A study by Ollé, J. and Viñas, J. aims to help create guidelines for the regulation of Atlantic
Bonito, as it is an overfished species found in the Mediterranean, the coasts of the Iberian
Peninsula, and the northwestern African continent. Being a commercially important
species, it is necessary to determine which populations are part of it in order to create a
conservation plan for biodiversity and prevent the loss of genetic variability in this species.
To achieve this, samples have been collected from 92 individuals from Tunisia, Spain,
northern Portugal, southern Portugal, Morocco, Mauritania, Senegal, and Ivory Coast. The
restriction associated DNA sequencing (RADseq) technique has been employed to genotype
all individuals and observe genetic differences among populations. This sequenced genetic
information needs to go through an assembly program called Stacks, which includes the
parameters m, M, and n that determine how these assemblies will be produced and influence
the coverage and number of detected polymorphic sites.
Therefore, this study investigates how these parameters m, M, and n influence a small
representation of the populations, namely three individuals per population, to determine
the most efficient values for recovering the maximum number of true polymorphic loci with
high sequence coverage. Consequently, eleven tests are generated with different
combinations of parameters corresponding to the mean values within the range of possible
values.
After collecting coverage and polymorphic loci data obtained from Stacks, the nonparametric statistical Kruskal-Wallis test is used, which reveals significant differences
between the tests, both in terms of polymorphic data and coverage. In the case of coverage,
it is observed that it mainly depends on the parameter m. Although there are significant
differences between the highest and lowest assigned values of m, the data already shows
high coverage for all the m values. Finally, the polymorphic site data reveal many more
significant differences between groups, primarily dependent on M and n.
The most important conclusion of this project is that for each set of RADseq data, a prior
parameter validation step must be carried out since optimal values can vary depending on
the species. In the case of the data used in this project, there is no single correct combination
of parameters. Therefore, the obtained results serve as a guide for future projects using
Atlantic Bonito data or related species when deciding the most optimal values for that specific data set