The typical dataset analyzed by this method is composed
of subjects (observations) and variables of several data types. Subjects
are divided into 2 groups (binary outcome), called 'cases' and 'controls'.
In a genetic environment, variables are discrete variables (genetic
markers, such as SNPs, HLA markers, CA repeats, etc...) and continuous
variables (gene expression data, clinical sub-phenotypes).
Missing values are permitted. The SAS software allows
to analyze and correct for their possible impact on the analysis.
This method allows using prior knowledge for data
aggregation. For example, signal from several SNPs, part of the same
gene, can be aggregated and analyzed together, as one single entity
with known properties. Another example, at one level above: signal from
several genes, part of one ontology (a biologically meaningful group
of genes) can be aggregated and analyzed together. In each case, the
user will have to indicate this prior knowledge using a 'map' : SNP-to-gene
map, or SNP-to-ontology map. These maps represent a linear interaction
model.
Example
of a genetic dataset.
|
|
|
Genotypes
can be represented as a matrix:
| SubID |
CaseCont |
SNP1 |
SNP2 |
SNP3 |
SNP4 |
... |
| 1 |
CASE |
AA |
CG |
AC |
AC |
... |
| 2 |
CASE |
AT |
CG |
AC |
AC |
... |
| 3 |
CASE |
AA |
GG |
AC |
AA |
... |
| 4 |
CASE |
AT |
GG |
AC |
AA |
... |
| 5 |
CASE |
AT |
. |
AC |
AA |
... |
| 6 |
CONT |
TT |
CC |
AA |
CC |
... |
| 7 |
CONT |
AT |
CC |
AA |
CC |
... |
| 8 |
CONT |
TT |
. |
AA |
AA |
... |
| 9 |
CONT |
TT |
GG |
AA |
AA |
... |
SNP
are 'trinary variables': composed of 2 alleles (among ATCG),
giving 3 possible combinations, such as AA, AT and TT
Missing values are allowed
(i.e. SNP2, subjects 5 and 8). Missing
values can lead to difficulties if they have any 'correlation
pattern'. The sofware allows to analyze the impact of these
missing values on the overall analysis, and gives several
options for their management.
SNP-to-gene
map :
| SNP id |
gene ID |
| SNP1 |
APOE |
| SNP2 |
APOE |
| SNP3 |
APOE |
| SNP4 |
APOE |
| SNP4 |
HDL |
| SNP5 |
HDL |
| SNP6 |
IL1 |
| SNP7 |
IL1 |
| ... |
... |
These maps allow analyses to be performed
at a 'gene level' or an 'ontology level'. They may include
one-to-one or one-to-many relationships as shown above (SNP
#4 maps to two genes).
|