Protocols headers Main page Protocols Resources
 
        
Analysis
Links in analysis section Overview Image Analysis/ArraySuite Database Design & Development Data Visualization Expression Clustering Discriminative Gene List
Image Analysis

The objective of the microarray image analysis is to extract sample intensities or ratios, at each printed cDNA location in a given microarray scan, and then cross-link printed clone information so that biologists can easily interpret the outcomes and perform further data integration and analysis. Given the variety of avaliable microarray printers and scanner, along with various experiment protocols, the microarray images may derived from different print-modes, different labeling methods, and different hybridization techniques. Nevertheless, to simplify the presentation of information for this section we have chosen to contrain our discussion in the context of the gene expression experiment based on two-color fluorescent hybridization. More information is available in the Selected Publications, and questions and comments can be made to researchers listed in Credits and Contacts Information.

A typical microarray image is generated from an array of cDNA probes which is hybridized to two samples, one being red fluor-tagged and the other green fluor-tagged. The composite color image is constructed by placing each monochrome image into the appropriate color channel. The tasks of microarray image analysis can be further-divided into following tasks:

  1. Array target segmentation
  2. Background intensity extraction
  3. Target detection
  4. Target intensity extraction
  5. Normalization and Ratio analysis
  6. Measurement quality assessment
  7. ArraySuite software package

There are many other important statistical issues associated with microarray image analysis. We encourage readers to consult the Selected Publications for further studies.

Grid Overlay  1. Array Target Segmentation

Since each element of an array is printed automatically to a pre-defined locations, we can safely assume that the detectable signals form a regular array which can be automatically aligned to a predefined grid-overlay. The initial position of the grid, as shown in the picture, can be either 1) manually determined if no particular orientation markers are printed, or 2) automatically determined if certain orientation markers are presented in the final image and the entire array has no obvious defects. Due to the complication of the customized print procedures and various hybridization protocols, we assume the initial grid-overlay is manually determined. Some automatic refinement procedures may apply to further adjust the grid alignment.

Given a known print-mode of microarray printer robot, the target position can be deconvoluted back to its original microplate position where stored clone information, gene-in-plate-order file (or GIPO file in short) attached to each cDNA target. Automatically decoding some popular microarray printers' print-modes is the key to the demanding research requirement. Some Filemaker tools for GIPO files creation and managment are available at Database section.
 

Local Background  2. Background Intensity Extraction

The background of the microarray image is not uniform over the entire array, therefore it is necessary to extract local background intensity. The changes of fluorescent background across an array is usually gradual and smooth, and may be due to many technical reasons. Abrupt changes are rare and when they happen, the actual signal intensities of array elements near these changes are not reliable. Conventionally, pixels near the bonding box edge are taken to be the background pixels, and the average gray-level of these pixels provides an estimation of the local background intensity. We employ a different approach which is more robust under many conditions. First, the fluorescent background is modeled by a Gaussian process, perturbed by a signal process. For example, if we choose a larger area (e.g. 30x30 box centered at a particular target) the gray-level histogram within the box is usually a uni-modal since the majority of the background pixel grey-levels is narrowly distributed while the target pixel grey-levels spread more uniformly up to a much higher gray-level. The location of the mode of the histogram, therefore, provides an estimation of local background and the left-tail of the histogram provides the spread (standard deviation) of the background intensity.
 

Target Detection  3. Target Detection

One of the difficult image processing tasks is to identify the target region within the bonding box. Each target is somewhat annular resulting from both how the printer pen dispense the cDNA onto the slide and how the slide is pre-treated. We typically assume each pen produces an array of cDNA targets with a roughly same morphology. A mask, which defines a possible target region for each pen, can be obtained by averaging some intensely expressed locations. This step also eliminates some of the noise that may happen to be near the cDNA targets. Also, it is important that the final signal intensity be measured over regions corresponding to probe-hybridized-to-target area, or the union of detectable areas from red and greeen channels. The reason is simple: if we observe either one of fluorescent signals, the underlying region must belong to the region where cDNA deposited.

Conventionally a fixed thresholding method is used in image analysis. The threshold value T can be determined from the local background mean intensity m and its standard deviation s (e.g. T = m + 3s). However, the simple fixed thresholding method fails quite often due to variability of the background and the signal, particularly when signal is weak (a frequent finding in cDNA array experiments). To aviod these problems, some sophisticated thresholding method may be implemented. One of the method that we utilize is the Mann-Whitney method which takes sample pixels from background and then performing a rank-sum hypothesis test on the target pixels.
 

Intensity Measurement 4. Target Intensity Extraction

The sample intensity measurement is chosen to be the trimmed-average gray-level within the target region. We discard top 5% to avoid some saturation problem and bottom 5% pixels to relieve possible mistake from target detection. Keeping in mind that the final measurement is the ratio of two intensities (R/G), the average measurement will provide, to some degree, a data smoothing effect. The local background value is then subtracted from the reported sample intensities from red channel (R) and green channel (G), and then the ratio (R/G) is calculated. Clearly, the ratio measurement is the ratio of two average intensity measurements. There are of course some other choices for ratio measurement including 1) the average of ratios from every pixel location; and 2) the linear regression slope of R-G gray-values from every pixel location.
   

Ratio Histogram 5. Normalization and Ratio Analysis

We have used the expression ratio to determine whether a gene expression differs significantly between the red and green channels. Such an approach is intuitive because two similar samples lead to a R/G ratio close to 1. Assume the ratio (denote as t) extracted from microarray image satisfied following conditions: 1) normality, 2) independence, 3) sufficiently positive, and 4) constant-coefficient-of-variation, we can approximate the ratio distribution by

where t is the ratio from each gene, c is the coefficient of variation (CV) of the signal intensity. the parameter c determines the spread of ratio distribution. The parameter of the distribution can be estimated using a maximum-likelihood method. Before utilizing the ratio distribution, a ratio calibration procedure (or intensity normalization procedure) must be carried out in order to satisfy the null hypothesis in which two signals are assumed to be the same probabilistically. To satisfy this condition, a set of "house-keeping" genes has been chosen as the internal control genes. We refer this set of genes as a "house-keeping" gene set, indicating the selection is based both on a biological basis as well as on their experimental behavior (ratio close to 1.0). The significance of basing our measurement on the the analytical ratio distribution is that we can associate a confidence interval to each ratio measurement so that a significant difference can be easily detected. Equally important, this approach allows us to associate a p-value to each ratio measurement.
   

 

6. Measurement Quality Assessment

   7. ArraySuite Software Package

A set of software tools are being developed at the Cancer Genetics Branch, NHGRI, NIH. The tool suite is a collection of IPLab extensions for Macintosh computer (IPLab is an image processing package by Scanalytics). Some of tools include: 1) LoadSKN, which loads images from the NIH scanner or other scanning instruments, 2) AlignArray, which aligns two images in case the images from red and green channels were scanned separately, 3) DeArray, which is the central processing tool and does most of image processing tasks including: target segmentation, background intensity estimation and probe intensity extraction, and 4) TargetLocator, which reports target information, refines statistics and performs some image enhancement tasks. The development of the ArraySuite is a continuous process along with the perfection of the microarray technology and the progress of various applications of the technology. To request further technical information go to Credits and Contact Information.




New The Cancer Research paper: The Gene Expression Response of Breast Cancer to Growth Regulators: Patterns and Correlation with Tumor Expression Profiles is available here.

NHGRI Homepage            Cancer Genetics Branch          Tissue Microarrays