Summaries
When we are interested in assessing how well each locus behaves in
reporting microhaplotypes, it is time to turn to the “Filter Analysis”
panel choice. First, go ahead and choose Criteria Cutoff –>
Global Scope. This view gives you histograms of the read depths and
allele balance ratios (the filtering choices in effect appear as dashed
red lines), and, below that, a tabular perspective on the outcome of
different filtering choices on the total number of haplotypes recorded
in the data set. The field selection choices that are present are
applied here. So, if ALL loci are selected, then the histograms include
reads from all the haplotypes at all the loci, and the tabular summary
counts up the total number of haplotypes typed and the total number of
individuals typed at all loci. If you select just one locus, the results
reflect that one locus, and the histogram results are further broken
down by different haplotypes within that locus.
With that in mind, choose the first locus in the dropdown menu:
Plate_1_A11_Sat_GE820299_consensus
under “> field
selection” and see how that changes the broad summary. Note that across
a broad range of the two filtering options (minimum read depth and
minimal allelic balance) there are eight haplotypes in total
discovered.
In order to search for individuals or loci that have more
high-read-depth alleles that one would want, you can choose Criteria
Cutoff –> Quality Profiling. This view shows individuals or loci
that have more than n alleles
that pass the filters. For example, if you are dealing with diploids,
then you would set n to 2 (in
the “+read criteria: Top n Alleles” option). Then any individuals or
loci that had more than 2 alleles that satisfied the minimum read depth
and allele balance criteria would be noted here. This is good way to
look out for contaminated samples or loci that amplify paralogous
regions.
To see how the inferred haplotypes look in terms of haplotype
frequencies, and also how the genotypes look in terms of Hardy-Weinberg
equilibrium, you can choose Genotype Call –> Summaries. This
view consists of four figures. The first, in the upper left simply shows
the frequencies (and the total read depth) of different haplotypes. The
plot in the upper right shows the relationship between the observed
frequency of different genotypes and the expected frequencies under
Hardy-Weinberg equilibrium. The individuals used in creating these
summaries depends on which “Group” is chosen. In this case we have “ALL”
chosen, and that is fine because the two groups are essentially
identical, genetically. However, if we were dealing with groups that
were genetically differentiated, we would not want to assess conformance
to Hardy-Weinberg proportions of a mixture of those different groups! In
such a case it is worthwhile to look at one group at a time.
The expected number of different genotypes is shown by the outlines
of circles and the observed number by the filled, colored circles. Green
are homozygotes, orange are heterozygotes, and it should be relatively
self-explanatory. There is not a scale, but if you click in the center
of any of the genotypes with observed (non-zero) counts, you will be
told (in the upper left of the panel) what the expected and observed
numbers were for that genotype. These plots are not meant to provide a
defensible test of departures from Hardy-Weinberg equilibrium, but do
allow the user to diagnose loci that are grotesquely far out of
Hardy-Weinberg equilibrium.
Below the haplotype frequencies and HW conformance plots you will
find a simple bubble plot expressing haplotype frequencies in the
different groups. In the case of the example data there are only two
different groups and they have very similar allele frequencies. This
plot becomes more useful when one is comparing allele frequencies across
many different groups.
Finally, you may need to scroll down to see the final figure in this
display. It is a representation of the haplotype sequences, their
frequencies, and the positions of the variants within them along each
amplicon.
Allele Biplots
Choosing Genotype Call –> AR Refinement takes you to a
very informative screen. It is described above in the vignette. Read
through the section that describes it and then try playing with the
sliders to move the four different lines around the plot and see the
effect on whether genotypes get called or not.
Note that you can use the blue lasso tool (upper right corner of the
scatter plot) to select a lot of points, whose values will then be
revealed in a table below. The red lasso can be used to de-select points
and you can check the box “keeps pt selection between loci” to maintain
focus on those points as you move from locus to locus. This can be very
useful for identifying individuals that show aberrant read depths across
multiple loci.