Automatic subgrid detection in microarray images

Automatic subgrid detection in microarray images
Automatic subarray detection in microarray images
A. Mastrogianni1, G. Doukas1, E. Dermatas1 and A. Bezerianos2
1
Dept. of Electrical Engineering and Computer Technology, University of Patras, HELLAS
2
Dept. of Medical Physics, School of Medicine, University of Patras, HELLAS
Abstract. In this work a novel algorithm for automatic subarray detection in
microarray images, taking into account the supreme importance of this step for
a subsequent accurate microarray image analysis, is described and evaluated.
Initially, in the detected microarray area, a novel profiling projection-based
method derives the subarray grid. During grid detection, spot spacing is
estimated and used for exact subarray location. The accuracy and efficiency of
the approach is validated in three different databases of real distorted
microarray images giving 58.82%, 100% and 83.33% error-free detection of
subarray position.
Keywords: microarray image analysis, gridding, subarray detection.
1 Introduction
DNA microarray technology has empowered the scientific community to understand
the fundamental aspects, underlining the growth and development of life as well as to
explore the genetic causes of anomalies occurring in the functioning of the human
body, offering a powerful measurement tool of gene expression activity. In a typical
microarray setting, thousands of cDNA clones are robotically spotted onto coated
glass slides in a highly condensed array. The information extracted in a single
microarray experiment is derived almost exclusively from the spot intensities of a
digital image [1]. Due to the nature of the acquisition process, microarray images
contain noise, such as dust, fingerprints, small particles, distortions from the optical
components and electronic noise. Furthermore, rotations, misalignment and local
deformations of the ideally rectangular grid often occur and consequently, affect the
accuracy of the further data analysis.
The whole image is composed by a matrix of equally spaced blocks called
subarrays. Each subarray consists of a certain number of rows and columns, not
necessarily the same. As shown in Fig.1, a typical microarray image contains several
equal-size subarrays. The spots in a subarray are arranged in a relatively uniform
spacing with each other. They have a roughly circular shape, though some show
significant deviations from this shape due to the experimental variation of the spotting
procedure. In general, the shape and the size of the spots may fluctuate, significantly,
across the array. Typical values of the spot-radius in a real microarray image are: 2, 5,
9 and 12 pixels, while the spacing along the rows varies from 17 to 22 pixels and the
spacing along the columns takes values between 18 and 24 pixels [2].
The ideal microarray image (Fig.1) has the following properties [3]:
1. the size of the subarrays is identical,
2. the spacing between subarrays is regular,
3. the location of the spots is centered on the intersections of the lines of the
subarray,
4. the size and shape of the spots is circular and it is the same for all the spots,
5. the location of the grids is fixed in images for a given type of slides,
6. there is no dust or contamination on the slide and finally,
7. the background brightness is minimal and uniform across the image.
In typical microarray images, none of these properties is satisfied. The aim of the
image pre-processing methods is to restore the properties of the ideal microarray
image in distorted images.
Horizontal spacing
Spot
Vertical spacing
Subarray
spacing
Subarray/
Subgrid
Fig.1. Illustration of an ideal microarray image with constant shape.
Microarray image processing consists of five tasks that are carried out
sequentially: gridding or addressing, segmentation, quantification step, normalization
and the data mining step [4-6]. The first critical stage in the image analysis process is
referred to the identification of the spot centers in a microarray image or usually in a
subarray or subgrid image so as to facilitate the addressing procedure. In the last
decade, many approaches for solving the gridding problem have been done in the
field of the bioinformatics but rarely the pre-processing step of the subarray detection
is taken into consideration [7-12].
In a considerable number of the proposed methods, referring to the gridding
problem, it is arbitrarily assumed that the subarrays have been identified, either
manually or automatically and the whole approach is evaluated in a single subarray
image specified by the researcher. However, there has been a remarkable effort by
few researchers [13-16] to overcome this assumption and to give a solution to the
problem of the subarray detection. The disadvantage of those approaches lies in the
fact that the methods have not been tested in microarray images with high level of
noise.
In the next section, a detailed outline of the subarray detection problem is
presented. The novel algorithm is applied in real microarray images whether the level
of the noise is high or low. In section 3, the experimental database and a number of
representative examples are shown. Finally, a short discussion on the results is given.
2 Subarray detection
The proposed method is subdivided into five main steps, as shown in Fig.2, requiring
only four explicitly defined parameters, i.e. the number of microarray and subarray
grid-rows and grid-columns. In Fig.3, a set of microarray image parameters and their
definition in the rest of this document is presented. The subarray spot spacing
parameter, although is preferred to be known a priori, it can be easily estimated and
evaluated during step (d), as shown in Fig.2. An early estimation of the subarray spot
spacing can be done by dividing the width of the image in pixels with the expected
number of spot columns, i.e. the product of the number of grid columns and subarray
grid columns (Ncol* ncol). Another parameter that proves to be valuable in detecting the
microarray grid is the spacing of subarrays in both x and y axis as shown in Fig.1.
Generally, the space between subarrays is not constant but a very small variance in a
typical microarray image is met. Depending on the experimental image database, such
a parameter may not be fixed, but there should be at least a bounded estimation either
in absolute pixel value, or as a function of spot spacing. If this parameter is not known
a priori it can be estimated, when needed, as described in step (e).
(a) Noise reduction
(b) Detection of microarray area
(c) Align the microarray image grid with the horizontal
and vertical axis
(d) Locate spot centers and evaluate spot spacing
(e) Subarray and spot detection
Fig.2. Flowchart of the proposed subarray detection method.
In the first step the microarray image is converted into a gray-scale image and a
median filter is applied to reduce “salt&pepper”-like noise. Next, a binary version of
the image is constructed, utilizing a threshold level produced by Otsu's method [17].
Parameter
Symbol
Number of microarray grid columns
Ncol
Number of microarray grid rows
Nrow
Number of subarray grid columns
ncol
Number of subarray grid rows
nrow
Spot spacing
d
Subarray spacing in X axis
Dx
Subarray spacing in Y axis
Dy
Subarray width
Wsa = ncol * d
Subarray height
Hsa = nrow * d
Fig.3. Description of the set of image parameters used in the proposed method.
The second step determines the rectangular region that envelops the important
data of the microarray image. This is performed by locating the first and the last row
and column in the binary image, where the brightness sum exceeds a specified
threshold. The threshold may vary, depending on the geometry of the grid and the
noise of the image. In the microarray images of the three tested databases, a threshold
equal to the estimated spot spacing is proved to be a sufficient choice. The actual
cropping process is performed on a greater rectangular region than the located rows
and columns avoiding the loss of important data. An expansion of the original
rectangle by a value equal to the threshold previously derived on every dimension,
has proved to be sufficient.
The next step is determining the grid alignment angle referring to horizontal and
vertical axis. To accomplish this rotation, an alignment evaluation function is required
performing the following estimations:
1. the sum of white pixels of each row of the image: S(r),
2. the maximum sum M of all rows: M=max(S(r))
3. the number of rows Chigh satisfying the following condition: S(r) > M – Th,
where Th is a threshold, estimated automatically as a function of spot
spacing.
4. the number of rows Clow satisfying the following condition S(r) < Tl , where
the threshold Tl is estimated also automatically as a function of spot spacing,
5. the value of the alignment evaluation function is given by the sum of Chigh
and Clow.
An alternative implementation of the proposed evaluation function can be achieved
deriving the corresponding parameters column-wise. In both implementations the
evaluation function counts the rows (columns), where very high and very low
accumulated brightness across rows (columns) is met.
The greater the evaluation function, the better grid alignment is achieved. The
evaluation function is applied on a set of images produced by rotating the cropped
binary image (created at step (b)) in the range of application-specific angles. The
rotated image with the greater evaluation function defines the desired alignment. In
practice, relatively small angles (less than 5 degrees) should be applied, and therefore
in the third step high accuracy and low computational complexity can be achieved.
During the fourth step, individual spots on the aligned binary image are identified
and labeled, detecting the isolated white areas [18]. The center of each spot is located
by estimating the mean pixel of each isolated area. Finally, spot spacing, if not known
a priori, can be estimated by the most frequent pixel-distance of successive spot
centers.
In the last step, the grid detection takes place using an artificial binary image
generated by drawing filled circles using as centers the previously detected spot
centers and radius equal to the 70% of the spot spacing (derived or known a priori).
In the artificial image, the subarray detection algorithm locates “empty” regions on
the x and y axis of the image where the sum of columns or rows respectively is less
than a specific threshold, indicating the possible existence of a grid line. This
procedure is repeatedly executed, starting from a relatively small threshold and
increasing it until specific criteria are met for each axis separately. These criteria are:
1. The distance between successive grid-lines must lie within specific limits.
2. The number of the detected grid lines must be equal to the number of
expected number of grid columns or rows plus one.
The limits of the grid line distance of the first criterion depend on the geometrical
features of the input microarray image. Actually, the detected grid line distance may
vary from at least Wsa to a maximum of Wsa + Dx horizontally or from Hsa to Hsa + Dy
vertically. If the parameters Dx and Dy are not known a priori, they can be estimated
from the width of detected “empty” regions.
3 Experimental results
The proposed method was evaluated in three databases, encoded in gif-formatted
files. The “Human Sarcomas” database contains 34 microarray images of 32
subarrays, each one consisting of 6 x 8 spots [19]. The “Young_vs_Old_Transgenic”
database consists of 14 microarray images containing 48 subarrays of 29 x 30 spots
[20]. A total number of 36 cDNA microarray images from the “Lymphoma/Leukemia
Molecular Profiling Project” (LLMPP) consists the third database, in which each
image contains 16 subarrays of 24 x 24 spots [21].
The automatically defined thresholds for all microarray databases are the same
for all databases: Th = 2 * d, Tl = 2 * d. A good choice for the crop-threshold is twice
the estimated spot spacing. Only in the LLMPP database the microarray alignment
process derives significant projection angles. Therefore, the searching area of the
microarray alignment is extended to [-3, 3] degrees.
The proposed method, regarding to the “Human sarcomas” database, has detected
all subarrays in the 58.82% of the available images, as shown in the example of Fig.4.
The algorithm was proved to be inefficient for a number of images, strongly distorted
during the microarray experiment. Fig.5 (a) depicts a portion of a microarray image
where a number of low brightness spots can be observed while strong “salt&pepper”
noise is also present (76168.gif). This results in the loss of valuable information
during the binary image generation and consequently to the misevaluation of spot
spacing and the misalignment of the detected grid. In Fig.5 (b) another example of an
extremely distorted image (71824.gif) is illustrated, where although the spot intensity
is adequate, the noise level is too high. It must be noted that both images in Fig.5
were inverted and adjusted to contrast level allowing the reader to perceive the
importance of the problem. It has to be mentioned that in “Human Sarcomas”
database the total number of subarray-spots is considerably small and the spacing
between adjacent spot is greater than typical distance met in other databases. These
effects allow the proposed method, during step (d), to wrongly assign “salt&pepper”like noise as spot centers.
Fig.4. Original microarray image from the “Human Sarcomas” database, the corresponding
artificial generated image and the detected subarrays.
low brightness spot
heavily noisy image
(a)
(b)
Fig.5. (a) Portion of a microarray image, contained in the “Human Sarcomas” database with
low brightness spots, (b) Example of a heavily noisy image in the “Human Sarcomas” database.
The “Young_vs_Old_Transgenic” database results in the detection of 100% of the
subarrays, as it works perfectly in all microarray images (Fig.6), in spite the presence
of noise.
The experimental results, concerning the LLMPP database reached to 83.33% of
successful detection of subarrays (Fig.7). Although the microarray grid of the images
was tilted in most cases, the method managed to correct the alignment and detect the
subarrays properly. In contrast with the “Human Sarcomas” database, these images
had sufficient spots density that allows the algorithm to produce a more accurate
evaluation of spot spacing. Grid detection has failed in very heavily distorted and
noisy images.
4 Conclusions
Image analysis is an essential aspect of microarray experiments. Until recently, the
pre-processing step of the subarray detection was requiring human intervention with
consequences in the whole procedure, as the manual detection is a time-consuming
operation in order to achieve good results. The proposed method manages to detect, in
a precise manner, a respectable number of subarrays of three different databases, each
one containing various types, regarding to the abundance of noise, the number of
subarrays and the number of rows and columns of each subarray, of scanned
microarray images. Among the most important advantages of the proposed method is
the accurate detection of subarrays even in relatively noisy and misaligned images.
Fig.6. Original microarray image drawn from the “Young_vs_Old_Transgenic” database
and its corresponding artificial generated image.
(a)
(b)
(c)
(d)
Fig.7 (a) Original image, (b) Grey scale rotated version, (c) Artificial generated image with
the detected subarrays, (d) Detection of the subarrays for the original image.
Acknowledgments. This paper is part of the 03ED013 research project, implemented
within the framework of the “Reinforcement Programme of Human Research
Manpower” (PENED) and co-financed by National and Community Funds (20% from
the Greek Ministry of Development-General Secretariat of Research and Technology
and 80% from E.U.-European Social Fund)
References
1. Stefano Lonardi, Yu Luo, ‘Gridding and Compression of Microarray Images’,
Proc. of Computational Systems Bioinformatics, CSB- 2004.
2. Peter Bajcsy, http://algdocs.ncsa.uiuc.edu/PR-20050204-1.pdf
3. A. Kuklin. ‘Laboratory automation in microarray image processing’, American
Laboratory, pp. 64–67, May 2000.
4. Gerda Kamberova, Shishir Shah, ‘DNA Array, Image Analysis, Nuts & Bolts,
DNA Press LLC, 2002.
5. S. Draghici, Data Analysis Tools for DNA Microarrays, CRC Mathematical
Biology and Medicine Series, Chapman & Hall, London, UK, 2003.
6. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann, San Francisco, Calif, USA, 2001.
7. Yuan-Kai Wang, Cheng-Wei Huang, ‘DNA Microarray Image Analysis Using
Active Contour Model’, Proc. Computational Systems Bioinformatics, CSB-2004.
8. Peter Bajcsy, ‘Gridline: Automatic Grid Alignment in DNA Microarray Scans’,
IEEE Trans. on Image Processing, vol. 13, no. 1, Jan. 2004.
9. Christian Uehara, Ioannis Kakadiaris, ‘Towards Automatic Analysis of DNA
Microarrays’, sixth Workshop on Applications of Computer Vision, WACV-2002.
10.Jinn Ho, Wen-Liang Hwang, Henry Horn-Shing Lu, and D. T. Lee, ‘Gridding Spot
Centers of Smoothly Distorted Microarray Images’, IEEE Trans. on Image
Processing, vol. 15, no. 2, Feb. 2006.
11. A. Mastrogianni, E. Dermatas, A. Bezerianos, ‘Robust pre-processing and noise
reduction in microarray images’, Proc. of the 5th IASTED International
Conference Biomedical Engineering, 2007.
12. Daniel Morris, ‘Blind Microarray Gridding: A New Framework’, IEEE Trans. on
Systems, Man and Cybernetics- Part C: Applications and Reviews, vol. 38, no. 1,
Jan.2008.
13. Y. Wang, M. Ma, K. Zhang, and F. Shih. , ‘A Hierarchical Refinement Algorithm
for Fully Automatic Gridding in Spotted DNA Microarray Image Processing’,
Information Sciences, 177(4):1123–1135, 2007.
14.Y.Wang, F. Shih, and M. Ma., ‘Precise Gridding of Microarray Images by
Detecting and Correcting Rotations in Subarrays’, In Proceedings of the 8th Joint
Conference on Information Sciences, pages 1195–1198, Salt Lake City, USA,
2005.
15.R. Fabbri, L. da F. Costa, J. Barrera, ‘Towards Non-Parametric Gridding of
Microarray Images’, 14th International Conference on Digital Signal Processing,
vol.2, 1-3 July 2002, pp.623-626.
16. Jinn Ho, Wen-Liang Hwang, Henry Horn-Shing Lu, D.T. Lee, ‘Gridding spot
centers of smoothly distorted microarray images’, IEEE Trans. on Image
Processing, vol. 15, no. 2, Feb. 2006.
17. N. Otsu, ‘A threshold selection method from gray-level histograms’, IEEE Trans.
Sys. Man., Cyber, vol.9, pp. 62-66, 1979.
18. Haralick, Robert M., and Linda G. Shapiro, 'Computer and Robot Vision', V. I,
Addison-Wesley, pp. 28-48., 1992.
19. Human Sarcomas database. Available at http://smd.stanford.edu/.
20. Young_vs_Old_Transgenic database. Available at http://smd.stanford.edu/.
21.The Lymphoma/Leukemia Molecular Profiling Project database. Available at
http://llmpp.nih.gov/.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising