|
|
Gene Loadings. Case/Control direction is not aligned
with first component.
Legend: Eigen Analysis. Genes loadings for the first
two components. Circa 300 genes (green crosses) are displayed.
Genes having high loadings are noted. The case/control line
shows the direction of the case/control distinction.
Statistical insight: Given the size of the dataset being
analyzed and LBF properties, it is expected that the Case/Cont
direction is strictly parallel with the first component. The
observed deviation indicates that Case/control distinction is
not responsible for most of the dataset genetic variability,
captured in the first component.
Clinical insight: The case/control line is not aligned
with the most of the genetic variability of this large dataset.
This suggest a within group genetic variability greater than
the between genetic group variability.
|
|
|
Subjects scores. Obvious genetic heterogeneity in controls.
Legend: Eigen Analysis. Subjects' scores for the first
two components. 1200 cases in pink and 1200 controls in blue.
Statistical insight: Figure shows individuals plotted
on the first two Eigen vectors of gene level LBFs for Disease
B. Substantial inter-individual heterogeneity is clear. The
controls appear to be from 3 extended diplotype groupings.
Clinical insight: The patterns observed in this figure
are driven solely by genetic factors, and highlight substantial
inter-individual genetic heterogeneity, as suspected from the
figure above, within controls.
|
|
|
Biplot shows controls genetic heterogeneity confounds
case/control genetic distinction.
Legend: Eigen analysis biplot. Cases are in blue, controls
in pink, and gene loadings in green. Genes having a high loading
are noted.
Statistical insight: The case-control dummy variable
is of importance for both components 1 and 2. Since those are
orthogonal, one should suspect a confounding factor.
Clinical insight: Genes having high loading 1 are pulling
apart cases from controls (expected) but also controls from
other controls (in the horizontal direction). Similarly, Genes
having high loading 2 are pulling apart cases from controls
(not expected) and also controls from other controls (in the
vertical direction). The within control genetic heterogeneity
confounds the main case/control genetic heterogeneity, and no
firm conclusion can be drawn. Another analysis should be performed,
with the 'right' and homogenous group of controls.
|
|
|
Subjects scores.
Legend: Control origin overlaid on previous figure.
Geographic origin is color-coded. Cases are in grey. There is
a clear match between controls geographic origins and the genetic
heterogeneity patterns observed above.
Statistical insight: Follow-up showed a mixture of four
geographic origins to the controls. The choice of the reference
population of controls in a study markedly affected the genes
that were indicated for follow-up (c.f. target discovery). The
first Eigen vector, in this example, does not represent the
true case-control distinction, since the within-control genetic
variability overwhelms the true case-control distinction.
Clinical insight: This example highlights the importance
of the choice of the reference population of controls.
|
|