The typical dataset analyzed by this method is composed
of subjects (observations) and variables of several data types. Subjects
are divided into 2 groups (binary outcome), called 'cases' and 'controls'.
In a genetic environment, variables are discrete variables (genetic
markers, such as SNPs, HLA markers, CA repeats, etc...) and continuous
variables (gene expression data, clinical sub-phenotypes).
This method allows using prior knowledge for data aggregation. For
example, signal from several SNPs, part of the same gene, can be aggregated
and analyzed together, as one single entity with known properties. Another
example, at one level above: signal from several genes, part of an ontology
(a biologically meaningful group of genes) can be aggregated and analyzed
together. In each case, the user will have to indicate this prior knowledge
using a 'map' : SNP-to-gene map, or SNP-to-ontology map. These maps
represent a linear interaction model.
Example of a genetic dataset.
|
|
|
Genotypes
can be represented as a matrix:
| SubID |
CaseCont |
SNP1 |
SNP2 |
SNP3 |
SNP4 |
... |
| 1 |
CASE |
AA |
CG |
AC |
AC |
... |
| 2 |
CASE |
AT |
CG |
AC |
AC |
... |
| 3 |
CASE |
AA |
GG |
AC |
AA |
... |
| 4 |
CASE |
AT |
GG |
AC |
AA |
... |
| 5 |
CASE |
AT |
. |
AC |
AA |
... |
| 6 |
CONT |
TT |
CC |
AA |
CC |
... |
| 7 |
CONT |
AT |
CC |
AA |
CC |
... |
| 8 |
CONT |
TT |
. |
AA |
AA |
... |
| 9 |
CONT |
TT |
GG |
AA |
AA |
... |
SNP
are 'trinary variables': composed of 2 alleles (among ATCG),
giving 3 possible combinations, such as AA, AT and TT
Missing values are allowed
(i.e. SNP2, subjects 5 and 8). Missing
values can lead to difficulties if they have any 'correlation
pattern'. The sofware allows to analyze the impact of these
missing values on the overall analysis, and gives several
options for their management.
SNP-to-gene
map :
| SNP id |
gene ID |
| SNP1 |
APOE |
| SNP2 |
APOE |
| SNP3 |
APOE |
| SNP4 |
APOE |
| SNP4 |
HDL |
| SNP5 |
HDL |
| SNP6 |
IL1 |
| SNP7 |
IL1 |
| ... |
... |
These maps allow analyses to be performed
at a 'gene level' or an 'ontology level'. They may include
one-to-one or one-to-many relationships as shown above (SNP
#4 maps to two genes).
|
|