|
|
Biplot shows 2 outlier subjects (1 case and 1 control,
bottom right).
Legend: Eigen analysis biplot. Cases are in blue, controls
in red, and 650K SNPs in green.
Statistical and Clinical insight: The 2 outliers explain
most of the variability in the dataset. Statistical follow-up
showed that their genotypes were almost identical (give or take
a few genotyping errors and missing data). Clinical follow-up
showed that the subject was enrolled in the study as a control,
before being affected with the drug-induced disorder and being
re-enrolled as a case. The subject data, when defined as a control,
was removed, and the dataset re-analysed (see below).
Quality Control insight: This example highlights an unusual
error, which was not detected with classical QC methods and
univariate analyses.
|
|
|
Biplot - Same dataset, having removed the 'control' subject.
Legend: Eigen analysis biplot. Cases are in blue, controls
in red, and 650K SNPs in green.
Statistical insight: This example shows the power of
the KernelPCA method, allowing a very large scale Eigen analysis
to be done effectively and quickly on a classical machine.
Clinical insight: There is now a clear separation of
cases and controls. The SNPs outside the central green cloud
explain most of the genetic variability aligned with Case/Control
distinction. These SNPs can be followed-up, for example, for
disease understanding in relation to the drug these patients
were exposed to.
|
|
|
Biplot - Same dataset as above. COVARIATION analysis
for predictive pattern discovery.
Legend: Eigen analysis biplot. Cases are in blue, controls
in red, and 650K SNPs in green.
Statistical insight: The covariation matrix was used
instead of the correlation matrix. Since the covariation factors
are not rescaled, the SNPs having the highest loading 1 are
in complete correlation and have the highest variability. They
could therefore be used as predictive markers. A multifactorial
predictive measure can be put in place using the first eigenvector.
Our aim is to develop this aspect in the next version of this
method and software.
Clinical insight: The SNPs outside the central green
cloud explain most of the genetic variability aligned with Case/Control
distinction and are the SNPs having the strongest signal.
These SNPs can be followed-up as potential predictive factors
for this drug-induced disorder. They differ from the SNPs of
physiological interest found above.
|
|