Taxonomy 3 - A multivariate genetic analysis
get newsletter
email us
Dataset 4    
  • A pharmacogenetic / adverse event study with
    • 20 cases (affected with a disorder induced by drug A) enrolled prospectively
    • ~200 controls (exposed to drug A) enrolled retrospectively.
  • 650K SNPs were genotyped in these subjects.

Biplot shows 2 outlier subjects (1 case and 1 control, bottom right).

Legend: Eigen analysis biplot. Cases are in blue, controls in red, and 650K SNPs in green.

Statistical and Clinical insight: The 2 outliers explain most of the variability in the dataset. Statistical follow-up showed that their genotypes were almost identical (give or take a few genotyping errors and missing data). Clinical follow-up showed that the subject was enrolled in the study as a control, before being affected with the drug-induced disorder and being re-enrolled as a case. The subject data, when defined as a control, was removed, and the dataset re-analysed (see below).

Quality Control insight: This example highlights an unusual error, which was not detected with classical QC methods and univariate analyses.

Biplot - Same dataset, having removed the 'control' subject.

Legend: Eigen analysis biplot. Cases are in blue, controls in red, and 650K SNPs in green.

Statistical insight: This example shows the power of the KernelPCA method, allowing a very large scale Eigen analysis to be done effectively and quickly on a classical machine.

Clinical insight: There is now a clear separation of cases and controls. The SNPs outside the central green cloud explain most of the genetic variability aligned with Case/Control distinction. These SNPs can be followed-up, for example, for disease understanding in relation to the drug these patients were exposed to.

Biplot - Same dataset as above. COVARIATION analysis for predictive pattern discovery.

Legend: Eigen analysis biplot. Cases are in blue, controls in red, and 650K SNPs in green.

Statistical insight: The covariation matrix was used instead of the correlation matrix. Since the covariation factors are not rescaled, the SNPs having the highest loading 1 are in complete correlation and have the highest variability. They could therefore be used as predictive markers. A multifactorial predictive measure can be put in place using the first eigenvector. Our aim is to develop this aspect in the next version of this method and software.

Clinical insight: The SNPs outside the central green cloud explain most of the genetic variability aligned with Case/Control distinction and are the SNPs having the strongest signal. These SNPs can be followed-up as potential predictive factors for this drug-induced disorder. They differ from the SNPs of physiological interest found above.

 


   top of page
Newsletter

    -> We plan to send (infrequent) emails regarding publications, talks, software updates, etc...

To subscribe, or manage your subscription, just enter your email address below:

email:

 

Send us your comments
Name   (optional)
Subject   (optional)
Email  
Comment  
    

You can also email us directly at:   taxonomy@delrieu.org