Taxonomy 3 - A multivariate genetic analysis
get newsletter
email us
Dataset 1
  • 500 patients affected with a common multifactorial, complex, neuropsychiatric disorder (disease A)
  • 500 control subjects (population matched)
  • circa 5000 Single Nucleotide Polymorphisms (SNP) among 1200 genes were genotyped.

    This dataset and these examples are discussed in more details in:
    Visualizing gene determinants of disease in drug discovery. Delrieu O and Bowman C.
    Pharmacogenomics. 2006 Apr;7(3):311-29.
    PMID: 16610942 / Pharmacogenomics

Genes loadings. Genes of interest for disease A and population heterogeneity

Legend: Eigen Analysis. Genes loadings for the first two components.
1500 genes (green crosses) are displayed.

Statistical insight: Figure shows the Eigen decomposition of gene level LBFs for disease A. The first two components are shown. Each eigenvector is a particular weighted set of correlated genes common in the people under study. Each eigenvector is composed of gene loadings. The first Eigen vector contains useful information regarding case-control features. Signal (boxes) to noise (green cloud) decomposition of each eigenvector was performed by modeling loadings distributions (winbugs was used).

Clinical insight: RedBox genes are 18 genes of physiological interest forming one ontology which one may want to follow-up as a therapeutic target and for disease understanding. The further to the right the gene is the stronger the case-control signal for it is. Bluebox genes are genes associated with population structure within or shared by cases and controls.

Subjects scores - Obvious heterogeneity in controls

Legend: Eigen Analysis. Subjects' scores for the first two components.
500 cases in pink and 500 controls in blue.

Statistical insight: X-axis represents Eigen analysis first component score 1 aligned with the main case/control distinction. Y-axis represent Eigen analysis score 2 explaining most of the remaining genetic heterogeneity. The patterns observed in this figure are uniquely driven by the subjects' genetic information. Horizontal differentiation between cases and controls is driven by RedBox genes. Vertical differentiation within controls is driven by BlueBox genes.

Clinical insight: This figure confirms that the RedBox genes 'pull apart' the cases from the controls. Also, an obvious pattern of genetic heterogeneity is observed within the control group. This pattern is driven by the BlueBox genes.

Subjects scores - Overlay of clinical variable on controls.

Legend: Method of recruitment was overlaid on the previous figure. In reddish, controls who had to make contact with the site (active recruitment), either from an advertisement (CNTL_AD) or via a relative acquaintance (CNTL_RA). In bluish, controls (CNTL_S) who were directly recruited by the site (passive recruitment). Cases in grey (OTHERS) have 'null' heat.

Statistical insight: Several methods can be used to match observed heterogeneity with a clinical variable. One option is to add in the Eigen analysis the clinical variable or its LBF (see dataset 3 below). Here, a recursive partitioning method on score 2 was used (HelixTree). Method of ascertainment came up as strongly significant, having corrected for multiple testings. In the overlaid heatmap, the color represents the local differential density of 'hot' and 'cold' subjects.

Clinical insight: Follow-up determined that the observed genetic heterogeneity in controls was associated with differences in the ascertainment method for different control individuals. This observed pattern is driven by genetic factors. This highlights potential biases due to the method of recruitment, driven solely by genetic factors. In other word, willingness to take part or not in a genetic study is driven by ... genetic factors!

Subjects LBF signal allows cases sub-group individualization. Case/control heatmap overlay.

Legend: Figure shows the result of aggregating gene level LBFs for the gene pathway/ontology of interest in Disease A (the 18 RedBox genes defined above). Mean(LBF) and var(LBF) are plotted for each subject. An heatmap is overlaid, representing the case (reddish) control (bluish) differentiation. A sub-group of cases can clearly be individualized.

Statistical insight: Aggregating gene level LBF for these 18 redbox genes is similar to describing the LBF signal, projected on the first component. This method can be applied on any group of genes having high loading 1 or which are part of a previously known ontology. Also, the additive properties of LBFs allow using any know dosing model. Here, the figure shows that cases part of the individualized sub-group have a relatively low variability compared to the other subjects. This indicates consensus (in the sense of genetic co-occurence) between those case among the 18 genes.

Clinical insight: On the lower right hand side are a set of cases whose genetic complement is distinct from other cases and all controls. Such cases have a particular homogeneous complement of gene targets for efficient drug discovery/development.

Subjects LBF signal projected on first component. Clinical heatmap overlay.

Legend: Previous figure heatmap is replaced with another heatmap showing that a clinical variable (EPQ-E) matches the sub-group of cases with a distinct genetic component. Reddish area indicate cases with higher than average EPQ-E. (controls = OTHERS have 'null' heat).

Statistical insight: The same recursive partitioning method described above was used.

Clinical insight: Case individualized in previous figure, have also unique clinical characteristics. Their clinical characteristics constitute a sub-phenotype that could be used as surrogate inclusion or exclusion criteria to enrich subsequent clinical trials of new drugs against disease A with populations having this genetic pattern.

 


   top of page
Newsletter

    -> We plan to send (infrequent) emails regarding publications, talks, software updates, etc...

To subscribe, or manage your subscription, just enter your email address below:

email:

 

Send us your comments
Name   (optional)
Subject   (optional)
Email  
Comment  
    

You can also email us directly at:   taxonomy@delrieu.org