| |
|
Image Analysis
The objective of the
microarray image analysis is to extract sample intensities or ratios,
at each printed cDNA location in a given microarray scan, and then
cross-link printed clone information so that biologists can easily
interpret the outcomes and perform further data integration and
analysis. Given the variety of avaliable microarray printers and
scanner, along with various experiment protocols, the microarray
images may derived from different print-modes, different labeling
methods, and different hybridization techniques. Nevertheless, to
simplify the presentation of information for this section we have
chosen to contrain our discussion in the context of the gene expression
experiment based on two-color fluorescent hybridization.
More information is available in the Selected
Publications, and questions and comments can be made to researchers
listed in Credits
and Contacts Information.
A typical microarray
image is generated from an array of cDNA probes which is hybridized
to two samples, one being red fluor-tagged and the other green fluor-tagged.
The composite color image is constructed by placing each monochrome
image into the appropriate color channel. The tasks of microarray
image analysis can be further-divided into following tasks:
- Array target segmentation
- Background intensity
extraction
- Target detection
- Target intensity extraction
- Normalization and
Ratio analysis
- Measurement quality
assessment
- ArraySuite software
package
There are many other
important statistical issues associated with microarray image analysis.
We encourage readers to consult the Selected
Publications for further studies.
 |
1.
Array Target Segmentation
Since each element
of an array is printed automatically to a pre-defined locations,
we can safely assume that the detectable signals form a regular
array which can be automatically aligned to a predefined grid-overlay.
The initial position of the grid, as shown in the picture,
can be either 1) manually determined if no particular orientation
markers are printed, or 2) automatically determined if certain
orientation markers are presented in the final image and the
entire array has no obvious defects. Due to the complication
of the customized print procedures and various hybridization
protocols, we assume the initial grid-overlay is manually
determined. Some automatic refinement procedures may apply
to further adjust the grid alignment.
Given a known print-mode
of microarray printer robot, the target position can be deconvoluted
back to its original microplate position where stored clone
information, gene-in-plate-order file (or GIPO file in short)
attached to each cDNA target. Automatically decoding some
popular microarray printers' print-modes is the key to the
demanding research requirement. Some Filemaker tools for GIPO
files creation and managment are available at Database
section.
|
 |
2.
Background Intensity Extraction
The background
of the microarray image is not uniform over the entire array,
therefore it is necessary to extract local background intensity.
The changes of fluorescent background across an array is usually
gradual and smooth, and may be due to many technical reasons.
Abrupt changes are rare and when they happen, the actual signal
intensities of array elements near these changes are not reliable.
Conventionally, pixels near the bonding box edge are taken
to be the background pixels, and the average gray-level of
these pixels provides an estimation of the local background
intensity. We employ a different approach which is more robust
under many conditions. First, the fluorescent background is
modeled by a Gaussian process, perturbed by a signal process.
For example, if we choose a larger area (e.g. 30x30 box centered
at a particular target) the gray-level histogram within the
box is usually a uni-modal since the majority of the background
pixel grey-levels is narrowly distributed while the target
pixel grey-levels spread more uniformly up to a much higher
gray-level. The location of the mode of the histogram, therefore,
provides an estimation of local background and the left-tail
of the histogram provides the spread (standard deviation)
of the background intensity.
|
 |
3.
Target Detection
One of the difficult
image processing tasks is to identify the target region within
the bonding box. Each target is somewhat annular resulting
from both how the printer pen dispense the cDNA onto the slide
and how the slide is pre-treated. We typically assume each
pen produces an array of cDNA targets with a roughly same
morphology. A mask, which defines a possible target region
for each pen, can be obtained by averaging some intensely
expressed locations. This step also eliminates some of the
noise that may happen to be near the cDNA targets. Also, it
is important that the final signal intensity be measured over
regions corresponding to probe-hybridized-to-target area,
or the union of detectable areas from red and greeen channels.
The reason is simple: if we observe either one of fluorescent
signals, the underlying region must belong to the region where
cDNA deposited.
Conventionally
a fixed thresholding method is used in image analysis. The
threshold value T can be determined from the local
background mean intensity m and its standard deviation
s (e.g. T = m + 3s). However,
the simple fixed thresholding method fails quite often due
to variability of the background and the signal, particularly
when signal is weak (a frequent finding in cDNA array experiments).
To aviod these problems, some sophisticated thresholding method
may be implemented. One of the method that we utilize is the
Mann-Whitney method which takes sample pixels from background
and then performing a rank-sum hypothesis test on the target
pixels.
|
 |
4.
Target Intensity Extraction
The sample intensity
measurement is chosen to be the trimmed-average gray-level
within the target region. We discard top 5% to avoid some
saturation problem and bottom 5% pixels to relieve possible
mistake from target detection. Keeping in mind that the final
measurement is the ratio of two intensities (R/G),
the average measurement will provide, to some degree, a data
smoothing effect. The local background value is then subtracted
from the reported sample intensities from red channel (R)
and green channel (G), and then the ratio (R/G)
is calculated. Clearly, the ratio measurement is the ratio
of two average intensity measurements. There are of course
some other choices for ratio measurement including 1) the
average of ratios from every pixel location; and 2) the linear
regression slope of R-G gray-values from every pixel
location.
|
 |
5.
Normalization and Ratio Analysis
We have used the
expression ratio to determine whether a gene expression differs
significantly between the red and green channels. Such an
approach is intuitive because two similar samples lead to
a R/G ratio close to 1. Assume the ratio (denote as
t) extracted from microarray image satisfied following
conditions: 1) normality, 2) independence, 3) sufficiently
positive, and 4) constant-coefficient-of-variation, we can
approximate the ratio distribution by

where t
is the ratio from each gene, c is the coefficient
of variation (CV) of the signal intensity. the parameter
c determines the spread of ratio distribution. The
parameter of the distribution can be estimated using a maximum-likelihood
method. Before utilizing the ratio distribution, a ratio calibration
procedure (or intensity normalization procedure) must be carried
out in order to satisfy the null hypothesis in which two signals
are assumed to be the same probabilistically. To satisfy this
condition, a set of "house-keeping" genes has been
chosen as the internal control genes. We refer this set of
genes as a "house-keeping" gene set, indicating
the selection is based both on a biological basis as well
as on their experimental behavior (ratio close to 1.0). The
significance of basing our measurement on the the analytical
ratio distribution is that we can associate a confidence interval
to each ratio measurement so that a significant difference
can be easily detected. Equally important, this approach allows
us to associate a p-value to each ratio measurement.
|
| |
6. Measurement
Quality Assessment
|
| |
7.
ArraySuite Software Package
A set of software
tools are being developed at the Cancer
Genetics Branch, NHGRI,
NIH. The tool suite is a
collection of IPLab extensions
for Macintosh computer (IPLab is an image processing package
by Scanalytics).
Some of tools include: 1) LoadSKN, which loads images from
the NIH scanner or other scanning instruments, 2) AlignArray,
which aligns two images in case the images from red and green
channels were scanned separately, 3) DeArray, which is the
central processing tool and does most of image processing
tasks including: target segmentation, background intensity
estimation and probe intensity extraction, and 4) TargetLocator,
which reports target information, refines statistics and performs
some image enhancement tasks. The development of the ArraySuite
is a continuous process along with the perfection of the microarray
technology and the progress of various applications of the
technology. To request further technical information go to
Credits
and Contact Information.
|
 |