Input data
Statistics
Find K : the number of clusters
- Snapclust is a fast maximum-likelihood method, combining the advantages of both model-based and geometric approaches (Beugin et al., 2018). The optimal number of clusters (k) are estimated using both the Akaike, Kullback and Bayesian Information Criterion (AIC, KIC, BIC, respectively). Ten runs of the Expectation-Maximisation (EM) algorithm are advised to estimate an accurate K and the probability of assignment (Q) of each individual into each of the k inferred.
- The function find.clusters runs successive k-means clustering with increasing number of clusters (k) and the optimal number of clusters is selected based on lowest Bayesian information criterion (BIC) (Jombart et al., 2010). 10 to 20 runs are advised to estimate an accurate K.
References:
Beugin, M.-P., Gayet, T., Pontier, D., Devillard, S. and Jombart, T. (2018) A fast likelihood solution to the genetic clustering problem. Methods in Ecology and Evolution, 9, 4. doi: 10.1111/2041-210X.12968.
Jombart, T., Devillard, S. and Balloux, F. (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet., 11, 94. doi: 10.1186/1471-2156-11-94.
DAPC Analysis
Scatter Plot
The Scatter Plot page provides a visual assessment of between-population differentiation. Generated by applying the R function scatterplot to a dapc object, the output generated will appear in one of two forms. If only one DA is retained (always the case if there are only 2 groups), or both the x-axis and y-axis of the scatterplot are set to the same value, the output will display the densities of individuals on the given discriminant function. If more than one DA is retained and selected, the output will display individuals as dots and groups as inertia ellipses, and will represent the relative position of each along the two selected axes.
The number of axes retained in both the PCA and DA steps of DAPC will have an impact on the analysis and affect the scatter plot. By default, the number of DA axes retained is set at the maximum of (K - 1) axes, where K is the number of groups. The default value of the number of PCA axes is more arbitrarily defined, however, the 'Use suggested number of PCA components?' tickbox provides the user with the option to use cross-validation to identify and select an optimal number of PCs, where one exists. For more on this, see the section on cross-validation.
There are a wide variety of graphical parameters for the DAPC scatterplot that can be customised by the user. Those parameters that lack intuitive definition are described further in the Glossary.