An enduring medical need
is the better understanding of the determinants of human traits such
as disease, or response to drugs. Genes and their protein products are
the inherited building blocks of metabolic pathways that drive human
physiology. The advent of high speed, high density genotyping offers
a new opportunity to probe such. Binary single nucleotide polymorphisms
(SNPs) at millions of nuclear chromosomal loci can now be employed to
genetically characterise trait cases versus non-trait controls.
Fundamental to this aspiration is the classification
of people, in particular how much evidence an individual has that indicates
that they are genetically distinct from another. This is difficult as
humans are 99.9% identical at the DNA level, with the possibility that
only one SNP change in three thousand million bases may cause a trait
of interest. With any two people having approximately one million SNPs
different than each other, a simple, sensitive, easy-to-calculate but
unified way of initially handling this taxonomic question for
individuals is needed.
Large volume, whole genome SNP scans of humans are
difficult to deal with. There are published methods that ascribe individuals
to evolutionary groups using genetic data. However, although a variety
of multi-locus approaches exist for marker mapping, few widely accepted
established methods, bar small-scale haplotype estimation and their
comparison, exist to attack the simultaneous processing of the legions
of polymorphisms in whole genome scans. Researchers rely on 'one at
a time' (univariate) tests beset with multiple testing problems. Some
combinatorial and partitioning methods show promise as does multifactor
dimensionality reduction and symbolic discriminant analysis. However,
many potentially widely touted multivariate methods such as artificial
neural networks or support vector machines produce impenetrable 'black
box' solutions. Possible other methods for consideration include cellular
automata and evolutionary computation.
All the above methods can appear highly complex
to a lay reader and are certainly computer intensive, often requiring
highly specialised staff to achieve. We believe that, what is needed
is a straightforward method that enables ordinary scientists and clinicians
themselves to expose useful biological and medical insights from whole
genome scan SNP association studies.