The aggregation of LBFs can be made over any domain
to help answer biologically and clinically relevant questions. This
allows the supervised biological exploration of SNP data,
and the classification of subjects. For instance, LBF values can be
summarised or collapsed (using simple summation, mean, variance...),
over:
- genotypes within a locus to give a SNP level measure
(SNP LBF)
- all SNP loci within a gene to produce a gene level
measure (gene LBF)
- any ontology of genes : any meaningful group of
genes, such as coding for proteins involved in one biological pathway
(pathway LBF)
- the whole genome
For any subject a vector of LBF measures is a type
of profile. This profile (for example along the genome) can be considered
as a stochastic (i.e. non-deterministic) sequence and thus be characterised
by its first and second moments (mean and variance respectively). These
have a meaning in terms of the biological classificatory signal involved
for that person. The pattern of LBF values for a domain reflects the
empirical genetic model of that domain for the trait being analyzed.
The examples below represent a dataset composed
of 500 cases (affected with a common neuropsychiatric disorder) and
500 control subjects genotyped for circa 5000 SNPs among 1500 genes.
Gene LBFs are aggregated over 2 ontologies: 'GTPase activity' (4 genes)
and 'negative regulation of cell proliferation' (23 genes). Mean and
variance of these LBF profiles are plotted for each subject (cases in
pink and controls in blue).
The patterns observed in these figures are driven
solely by genetic factors (and the case/control status used in the LBF
calculation), and highlight substantial inter-individual genetic heterogeneity.
- The first example shows clearly a sub-group of
cases, distinct from other subjects. These case share a specific 'GTPase
activity' pattern that could help define a sub-phenotype and could
be of interest for drug discovery or drug development (clinical trial
enrichment).
- The second example shows clearly a substantial
population heterogeneity in the dataset for this ontology, and does
not reveal any obvious contrast between cases and controls.