Taxonomy 3 - A multivariate genetic analysis
get newsletter
email us

Input datasets

The typical dataset analyzed by this method is composed of subjects (observations) and variables of several data types. Subjects are divided into 2 groups (binary outcome), called 'cases' and 'controls'. In a genetic environment, variables are discrete variables (genetic markers, such as SNPs, HLA markers, CA repeats, etc...) and continuous variables (gene expression data, clinical sub-phenotypes).

This method allows using prior knowledge for data aggregation. For example, signal from several SNPs, part of the same gene, can be aggregated and analyzed together, as one single entity with known properties. Another example, at one level above: signal from several genes, part of an ontology (a biologically meaningful group of genes) can be aggregated and analyzed together. In each case, the user will have to indicate this prior knowledge using a 'map' : SNP-to-gene map, or SNP-to-ontology map. These maps represent a linear interaction model.

Example of a genetic dataset.


Genotypes can be represented as a matrix:

 SubID   CaseCont   SNP1   SNP2   SNP3   SNP4   ... 
1 CASE AA CG AC AC ...
2 CASE AT CG AC AC ...
3 CASE AA GG AC AA ...
4 CASE AT GG AC AA ...
5 CASE AT . AC AA ...
6 CONT TT CC AA CC ...
7 CONT AT CC AA CC ...
8 CONT TT . AA AA ...
9 CONT TT GG AA AA ...

SNP are 'trinary variables': composed of 2 alleles (among ATCG), giving 3 possible combinations, such as AA, AT and TT

Missing values are allowed (i.e. SNP2, subjects 5 and 8). Missing values can lead to difficulties if they have any 'correlation pattern'. The sofware allows to analyze the impact of these missing values on the overall analysis, and gives several options for their management.

SNP-to-gene map :

SNP id gene ID
SNP1 APOE
SNP2 APOE
SNP3 APOE
SNP4 APOE
SNP4 HDL
SNP5 HDL
SNP6 IL1
SNP7 IL1
... ...

These maps allow analyses to be performed at a 'gene level' or an 'ontology level'. They may include one-to-one or one-to-many relationships as shown above (SNP #4 maps to two genes).

 

 


 

 


   top of page
Newsletter

    -> We plan to send (infrequent) emails regarding publications, talks, software updates, etc...

To subscribe, or manage your subscription, just enter your email address below:

email:

 

Send us your comments
Name   (optional)
Subject   (optional)
Email  
Comment  
    

You can also email us directly at:   taxonomy@delrieu.org