Protocols headers Main page Protocols Resources
 
        
Analysis
Links in analysis section Overview Image Analysis/ArraySuite Database Design & Development Data Visualization Expression Clustering Discriminative Gene List

Gene Expression Clustering Methods

A gene expression pattern derived from one microarray hybridization provides a snap shot of the state of of a living cell, which determines its biological behavior. As an example, a human cell contains approximately 3 billion base pairs, which encode about 50,000 to 100,000 genes. To add further complication, only a fraction of these genes are expressed in any given tissues. On the other hand, instead of treating gene expression pattern from a given microarray experiment as a single data entity, we can examine one gene a time across a biological process or a collection of biological samples, hence the gene expression profile.

Clustering analysis is a powerful tool which partitions biological samples or genes into well-separated and homogeneous groups based on their statistical behaviors. The main objective of clustering analysis is to find out the similarities between experiments or between each genes, given their expression ratios across all genes or samples, respectively, and then group the similar samples or genes together for the convenience of understanding and visualization. The clustering methods have been heavily studied for many years and widely applied in many areas. In this section, we will discuss some implementation that we've employed in our gene expression analysis. They are,

  1. Hierarchical clustering methods
  2. K-means or fuzzy C-means methods
  3. Self-organizing map
  4. Neural network

 

1. Hierachical Clustering method

Assuming we have m expression experiments containing n genes in each every experiment. After performing microarray image analysis and data integration, we obtained a mxn matrix of gene expression ratio, where each column of ratios represents the result from one expression experiment comparing the test sample to a common reference sample of choice.To simplified the discussion, we will only consider the algorithm in terms of the sample clustering.

To achieve the objective of clustering, we first evaluate all pair-wise similarities between samples , and then we employ the average linkage algorithm to group similar samples. Typically, we use Pearson correlation coefficient or Euclidean distance to quantify the similarity. Under certain normalization condition, these two similarity measurements are equivalent. After evaluating similarities from all pairs of samples, we can construct a distance matrix as shown below (Table1a). The hierarchical algorithm proceeds as follows. First we look for a pair of experiments with shortest distance or most similar in gene expression pattern (as given in the table, Exp1 and Exp2). We then construct a 'composite experiment' by averaging (thus the term of average-linkage algorithm) all gene expression ratios (log-transformed) from two experiments, and name it as Exp1-2. We again evaluate all distance from this composite experiment to all other experiments, and construct a smaller matrix as shown in Table 1b. This procedure is repeated until the distance matrix is reduced to single element.

 

 Exp1

 Exp2

 Exp3

 

 

 Exp1-2

 Exp3

 Exp1

 0      

 Exp1-2

 0  

 Exp2

 0.1  0    

 Exp3

 0.55  0

 Exp3

 0.7  0.4  0        
           
      Table 1a         Table 1b  

The graphical visualization of the hierarchical algorithm is illustrated by dendrogram where each merger is represented by a binary tree, and the length of each branch is indicative of the distance between two samples as given in Table 1a-b.

The implementation of the average linkage method will be available through a web interface. The downable version of the program is also in the preparation.

2. K-means Algorithm and Fuzzy C-means Algorithm

Under construction

 

3. Self-Organizing map

Under construction

 

4. Neural Network

Under construction.




New The Cancer Research paper: The Gene Expression Response of Breast Cancer to Growth Regulators: Patterns and Correlation with Tumor Expression Profiles is available here.

NHGRI Homepage            Cancer Genetics Branch          Tissue Microarrays