Spot detection for a 2-DE gel image using a slice tree with confidence evaluation

(1)

Contents lists available atScienceDirect

Mathematical and Computer Modelling

journal homepage:www.elsevier.com/locate/mcm

Spot detection for a 2-DE gel image using a slice tree with

confidence evaluation

I

Yi-Sheng Liu

a

, Shu-Yuan Chen

a,∗

, Ru-Sheng Liu

a

, Der-Jyh Duh

b

, Ya-Ting Chao

a

,

Yuan-Ching Tsai

c

, Jaw-Shu Hsieh

c

a_{Department of Computer Science and Engineering, Yuan Ze University, Chung Li, Taiwan}

b_{Department of Computer Science and Information Engineering, Ching Yun University, Chung Li, Taiwan} c_{Department of Agronomy, National Taiwan University, Taipei, Taiwan}

a r t i c l e i n f o

Article history: Received 3 January 2007

Received in revised form 5 November 2008 Accepted 12 November 2008 Keywords: 2-DE gel Protein Spot detection Slice tree

a b s t r a c t

Spot detection is an essential step in 2-DE gel image analysis. The results of protein spot detection may substantially influence subsequent stages of analysis. This study presents a novel method for spot detection with the addition of confidence evaluation for each detected spot. The confidence of a spot provides useful hints for subsequent processing, such as landmark selection, spot quantification and gel image registration. The proposed method takes slices of a gel image in the gray level direction, and builds them into a slice tree, which in turn is adopted to perform spot detection and confidence evaluation. The spot detection software is implemented on Windows using the proposed slice tree. Building a slice tree for a gel image of resolution 1262×720 takes about 1.5 s on an Intel©Pentium©III 1.2 GHz machine with 512 MB of RAM. Spot detection takes about 43 ms after building the slice tree. The detected spots are shown by different colors based on their respective confidence values. Moreover, pointing a mouse over a detected spot shows detailed information about the spot, including the confidence value. Experimental results indicate that confidence values are close to a subjective judgment.

Proteomics is the study of proteome, especially how proteins function in and around cells. Since proteins are directly

involved in the biochemical processes of cells, and have a differential expression between control and experimental cells, understanding of disease or biological properties can be improved by identifying the differential expression of proteome between control and experimental samples.

Protein separation is one of the most important stages in a proteomic study. Among all separation techniques, two-dimensional electrophoresis (2-DE) [1–5] is the best method for separating complex protein mixtures based on their charge and size. Spots in the gel are proteins that have migrated to specific locations. The spots in the gel may disappear, appear or change in size and intensity according to the differential expression of protein mixtures from the control and experimental samples. Differential protein expressions between various samples are obtained by analyzing the spot appearance in a gel. Although a 2-DE gel is a powerful technique that can separate hundreds of proteins simultaneously, there are still challenges in the usage of 2-DE gel. Complexity in the sample preparation and running procedure causes different geometric

I _{This work was partially supported by the National Science Council of Taiwan, ROC, under Grants NSC-94-2745-E-155-008-URD.}

∗_{Corresponding author.}

E-mail addresses:[email protected](Y.-S. Liu),[email protected](S.-Y. Chen),[email protected](R.-S. Liu),

(2)

Fig. 2. Slices of a spot. (a) A synthetic spot. (b) 3-D view of the spot in (a). (c) Slices of a spot and corresponding central points.

distortions between samples. Spots may be overlapping; streaks may occur with certain proteins, and staining variation may cause inhomogeneous backgrounds, nonlinear intensity, saturated spots and faint spots, as indicated inFig. 1.

Given the volume data and technical noise originating from the image acquisition process, manual analysis of a gel image is hard without the help of computer software. Analysis of gel image by image processing software requires an image pipeline that may contain image correction, spot detection, spot quantification, spot registration, data presentation and interpretation. These computation techniques have been comprehensively reviewed [1,4].

The quality of spot detection is an important factor that impacts on the performance of the image pipeline. To enhance the performance of spot detection, input images are pre-processed in order to correct the spatial and intensity variation of the gel image originating from the image acquisition process. Gustafsson [6] presented a current leakage model to minimize the effect of geometric distortions of the protein pattern due to current leakage. Melanie [7] adopts a local kernel to reduce the high-frequency noise inherent in the acquired images followed by histogram equalization, and contrast enhancement to improve the difference between spots and background. Background subtraction is then adopted to eliminate meaningless changes in the intensity level of the gel background. Horizontal or vertical streaks can be removed by a morphological opening with a horizontal or vertical cylindrical structuring element.

Given the extreme variety of spots appearing in gel image, many spot detection methods have been introduced. The most common steps in spot detection are segmentation and spot modeling. Segmentation is the process of segmenting the gel into regions, each of which may contain one spot. Watershed [8] is the most popular spot segmentation technique, due to its robustness to noise. However, over-segmentation is a well-known problem in Watershed. Spot modeling is often performed after Watershed to identify spots in the segmented image. Spot modeling can be adopted to filter out non-spot regions, combine fragmented regions or handle overlapping spots by statistical analysis. Srinark [9] adopted a Watershed algorithm to segment the input image into regions. The k-means clustering method is adopted to classify the pixels in each region into foreground (spot) and background (non-spot) pixels. After adopting a morphological closing to filter out noise pixels in the background area, a distance matrix is then adopted to estimate spot centroids. Region analysis is finally adopted for spot splitting and merging to solve the problem of over-segmentation inherent in Watershed. Boetticher [10] divided the input image into sub-images of rectangles based on the local maxima of the image. A Support Vector Machine is then adopted to classify the sub-images into spot and non-spot pixels. A comprehensive review of published spot detection methods and commercial software for 2-D gel images analysis can be referred to [1,4].

This study presents a novel method for spot detection that calculates the confidence of each detected spot. The confidence of a spot gives useful hints for subsequent processing, including landmark selection, spot matching and gel image registration. The proposed method takes slices of a gel image in the gray-level direction, and builds them into a slice tree, which in turn is adopted to perform spot detection and confidence calculation.

The reset of this paper is organized as follows. Section2presents the idea of detecting spots in gel images by a slice tree. Section3then describes the proposed algorithm in detail. Section4summarize the experimental results. Conclusions are finally drawn in Section5, along with recommendations for future research.

2. Approach

This section introduces the key concept of the proposed approach. Unlike other spot detection methods, our method slices the gel image, builds them into a slice tree, and then detects spots based on the slice tree.

The intensity is represented by the third dimension (Z axis). The intensity of a sliver-stained spot is approximately Gaussian distributed with the lowest intensity at the center, as shown inFig. 2(b). A series of slices of the spot can then be obtained in the intensity direction, as shown inFig. 2(c). Each slice has its own features, such as size, shape, central point and boundary smoothness. If the central points of the slices are projected onto the X –Y plane, then the projected points

(3)

Fig. 3. Binary images of a 2-DE gel image. The binary images shown in red color are superimposed on the original gel image. (a) Sample gel image including

three spots with gray levels from 90 to 218. (b) B218: binary image according to the maximum gray level. (c) B187: three spots still overlapping. (d) B162: one spot is split. (e) B108: lower spots are split. (f) B96: fewer binary pixels for lower intensity. (g) B90: binary image according to a minimum gray level, only pixels with lowest intensity still presented. (h) Detected spots are superimposed in the original gel image. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Border tracing to detect regions in binary images. (a)–(d) are the respective results of B187, B162, B108and B96inFig. 3. A border is a contour labeled by red color and all the pixels with green color inside the border compose the corresponding region.

belonging to the same spot will fall within a neighborhood. The distribution and number of projected points depend on the shape and appearance of the spots in the gel image, which can be used for spot detection.

Spots may be distorted [2,11], overlapping [12] and suffer from noise. These factors can make spot detection difficult and unreliable. The relationship between the slices of the spots can be included in the slice tree to resolve these problems and then obtain a robust spot detector.

3. Methods

3.1. Gel image slicing

For a 2D-gel image I, the binarized image Bgrelated to gray level g is defined by

Bg

(

x

,

y

) =

1 if I

(

x

,

y

) ≤

g,

0 otherwise, (1)

where I

(

x

,

y

)

denotes the intensity of pixel at coordinates

(

x

,

y

)

, and g denotes one of the gray levels between the maximum and minimum gray levels of I, denoted by gmaxand gmin, respectively.Fig. 3shows a sample gel image, and some of its binary

images related to specific gray levels.

Definition (Regions). Let r denote a subset of pixels in a binary image. If r is a connected set, then it is a region.

Regions can be identified as follows. Region borders in a binary image are first detected by border tracing. The set of pixels enclosed by a border is then denoted as the corresponding region.Fig. 4shows some results of border tracing and the detected regions forFig. 3.Fig. 4indicates that a candidate spot with a minimum gray level gsminappears as a sequence

of regions in binary images Bgsfor gmax

≥

gs

≥

gsmin. Intuitively, the sequence of binary images from Bgmaxto Bgmincan be

considered as computerized tomography (CT) images of all the spots in the gray level direction, i.e. the Z axis. Definition (Region Set). All regions in a binary image are called a region set of the binary image.

The gel image I has Nb

=

gmax

−

gmin

+

1 binary images. Binary images Bg are sorted in descending order of g, and R1

,

R2

, . . . ,

RNbdenote the region sets related to the binary images Bgmax

,

Bgmax−1

, . . . ,

Bgmin, respectively:

Rs

=

rs,i

|

i

=

1

,

2

, . . . ,

ns

,

s

=

1

,

2

, . . . ,

Nb

,

(2)

where rs,idenote the regions in the binary image Bgmax+1−s, and nsdenotes the number of regions in the binary image

Bgmax+1−s. Notably, s can be considered as the layer index of the region sets.Fig. 5shows a synthetic image to illustrate the

(4)

Fig. 5. A synthetic image to show the relationship among binary images, region sets and slice tree. (a) Synthetic image with four spots. (b) Slice tree viewed

in three dimensions, layers of binary images and region sets of the synthetic image are shown in a 3D view. Root of slice tree is shown by red circular point, other central points of regions are shown by black circular points, parent and children region are connected by green links. Projections of region centers are also shown in Bgmin. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3.2. Properties of regions

This section describes some properties of regions.

Definition (Binary Image Projection). For a binary image B_g, the projection of Bg,Ψ

(

Bg

)

, is defined as a set of coordinates

whose corresponding pixel values are 1. Ψ

(

Bg

) = (

x

,

y

)|

Bg

(

x

,

y

) =

1

.

(3)

Since a region is a subset of a binary image, the operationΨ can also be applied to a region, Ψ

(

rs,i

) = (

x

,

y

)|

rs,i

(

x

,

y

) =

1

.

(4)

Definition (Ancestor Region and Descendant Region). For two regions rs1,iand rs2,j, with s1

<

s2, ifΨ

(

rs1,i

) ⊇

Ψ

(

rs2,j

)

, then

rs1,iis an ancestor region of rs2,j, and rs2,jis a descendant region of rs1,i, and denoted by

rs1,icrs2,j

.

(5)

Definition (Child Region and Parent Region). For two regions rs1,iand rs2,j, if s1

=

s2

−

1 and rs1,i crs2,j, then rs2,jis a child

region of rs1,i, and rs1,iis the parent region of rs2,j.

Property 3.1. All regions in a binary image are mutual exclusive, i.e.

Ψ

(

rs,i

) ∩

Ψ

(

rs,j

) = ∅

if i

6=

j

.

(6)

Proof. For two distinct points

(

x

,

y

)

and

(

x0

,

y0

)

, where

(

x

,

y

) ∈

Ψ

(

rs,i

)

and

(

x0

,

y0

) ∈

Ψ

(

rs,j

)

, ifΨ

(

rs,i

) ∩

Ψ

(

rs,j

) 6= ∅

, then

(

x

,

y

)

and

(

x0

_,

_y0

₎

_{belong to the same connected set, that is i}

₌

_j.

Property 3.2. Every region in R2

,

R3

, . . . ,

RNbhas exactly one parent region. i.e. For s

=

2

,

3

, . . . ,

Nb

∀

rs,i

∈

Rs

, ∃!

rs−1,k

∈

Rs−1

,

s.t.Ψ

(

rs,i

) ⊆

Ψ

(

rs−1,k

).

(7)

Proof. If

(

x

,

y

) ∈

Ψ

(

rs,i

)

, then I

(

x

,

y

) ≤

gmax

+

1

−

s, and henceforth I

(

x

,

y

) ≤

gmax

+

2

−

s, thus

(

x

,

y

) ∈

Ψ

(

Bgmax+2−s

)

.

Restated, we getΨ

(

r_s_,_i

) ⊆

Ψ

(

B_g_max+2−s

)

. Thus,

∃

rs−1,k

∈

Rs−1such thatΨ

(

rs,i

) ⊆

Ψ

(

rs−1,k

)

. Together withProperty 3.1, we

can conclude that rs−1,kis the only region that satisfies the criterion.

Property 3.3. For two regions rs,iand rs−1,k, rs,iis a child region of rs−1,kif and only if their projections are overlapping, i.e.

Ψ

(

rs,i

) ⊆

Ψ

(

rs−1,k

) ⇐⇒

Ψ

(

rs,i

) ∩

Ψ

(

rs−1,k

) 6= ∅.

(8)

Proof. Only the

⇐

part of(8)needs proof.

FromProperty 3.2, a parent region rs−1,k0 of rs,i exists such thatΨ

(

rs−1,k0

) ∩

Ψ

(

rs,i

) =

Ψ

(

rs,i

)

. Assuming that k0

6=

k,

then from Property 3.1,Ψ

(

rs−1,k0

) ∩

Ψ

(

rs−1,k

) = ∅

. From(8), sinceΨ

(

rs,i

) ∩

Ψ

(

rs−1,k

) 6= ∅

, exists

(

x

,

y

)

such that

(

x

,

y

) ∈

Ψ

(

rs,i

) ∩

Ψ

(

rs−1,k

)

. This leads toΨ

(

rs−1,k0

) ∩

Ψ

(

rs−1,k

) 6= ∅

, which contradicts our initial assumption. Therefore,

k0

₌

_k.

Property 3.3simplifies the procedure of finding child regions. For a region rs,i

∈

Rs, the child regions of rs,i can be

found in the areaΨ

(

rs,i

)

of binary image Bgmax−s. Let the set of all child regions of rs,ibe denoted by Rs,i, then obviously

Rs+1

=

S

ns

i=1Rs,i. ByProperty 3.3, Rs,ican be built as

(5)

3.3. Slice tree

Regions in binary images are the basic units for spot detection and confidence calculation in our method. To increase the robustness of spot detection, the relationship between regions in successive binary images related to the same spot is organized in a slice tree.

Definition (Slice Tree). A slice tree for gel image I is defined as T

=

(

V

,

E

)

, where V denotes a set of nodes, and E denotes a set of links between the nodes.

Each node in the slice tree corresponds to a region. Hence, the node related to the region rs,iis denoted asV

(

rs,i

)

. According

to the layer structure of region sets in(2), V can be further divided into Nbexclusive subsets, that is

V

=

Nb

[

s=1

V_s

.

(10)

Nodes in Vscorrespond to regions in Rs, i.e.,

Vs

= {

V

(

rs,i

)|

i

=

1

,

2

, . . . ,

ns

}

.

(11)

Notably, nodes in Vshave depth s

−

1 in the slice tree.

Each link in E is an ordered pair of nodes

(

V

(

rs−1,k

),

V

(

rs,i

))

, where rs−1,kdenotes a parent region, and rs,idenotes a child

region, i.e.,

E

=

(

V

(

rs−1,k

),

V

(

rs,i

))|

rs−1,kcrs,i

,

s

=

2

, . . . ,

Nb

.

(12)

The slice tree contains much information about the gel image. Nodes in the slice tree have their own features about the corresponding regions. The links between nodes imply further information about the relations between nodes, and can be adopted to facilitate a variety of processing, including spot detection, spot quantification and gel image registration.Fig. 5(b) shows an example of a slice tree for the synthetic image inFig. 5(a).

3.4. Slice tree construction

A slice tree for gel image I is built in the sequence Bgmax

, . . . ,

Bgminaccompanied by the establishment of relations between

every pair of successive region sets Rsand Rs+1, for s

=

1

,

2

, . . . ,

Nb

−

1. More specifically, the slice tree is built by recursively

performing the procedure ProcessChildSlice(rs,i) with the region rs,iin Rsas the parameter. The pseudocode of the procedure

is outlined as follows.

Procedure ProcessChildSlice (r_s_,_i) 1. Get child region set Rs,iof rs,iusing(9).

2. If Rs,i

= ∅

then return,

else for all child regions rs+1,j

∈

Rs,i, do

2.1 Create a tree nodeV

(

rs+1,j

)

.

2.2 Set parent-child link betweenV

(

rs,i

)

andV

(

rs+1,j

)

.

2.3 If s

<

Nb

−

1 then call ProcessChildSlice(rs+1,j).

Building the slice tree involves first creating a root nodeV

(

r₁_,₁

)

, then calling ProcessChildSlice(r₁_,₁). From(1), r₁_,₁

=

Bgmaxcovers the whole gel image, and is the only region in R1.Fig. 6(b) shows the slice tree of the gel image inFig. 3(a).

3.5. Slice tree terminology

If N

(

Rs,i

)

denotes the number of child regions for rs,i, then nodeV

(

rs,i

)

has N

(

Rs,i

)

children in the slice tree. The nodes in

the slice tree can be divided into three categories based on the number of children: 1. Leaf nodes: N

(

Rs,i

) =

0

2. Solitary nodes: N

(

Rs,i

) =

1

3. Manifold nodes: N

(

Rs,i

) >

1.

Definition (Branches). If all links between manifold nodes and their child nodes are removed, then slice tree is divided into sub-groups, called branches.

Notably, all nodes in a branch have no more than one link. Clearly, each spot in the gel image has a corresponding branch in the slice tree.Fig. 7shows a more complex slice tree.

(6)

Fig. 6. Slice tree of a real gel image. (a) The same gel image asFig. 3(a). (b) The corresponding slice tree of (a). The denotations are defined the same as in

Fig. 5(b). In addition, two nodes in slice B108are shown by triangular points.

Fig. 7. More complicated example of slice tree. (a) Original gel image. (b) Corresponding slice tree of (a).

Definition (Branch Root, Leaf Branch, Internal Branch, Sibling Branches, Parent Branch and Child Branches). The node with minimum depth in a branch is called the branch root. A branch is called a leaf branch if it contains a leaf node, and otherwise is called an internal branch. Branches whose branch roots have the same parent node in the original slice tree are called sibling branches, and are also called child branches of the branch where the parent node resides, which in turn is called the parent branch.

Definition (Branch Length). For a nodeV

(

rs,i

)

, the branch length ofV

(

rs,i

)

is defined as L

(

rs,i

) =

1

−

s

+

arg Nb min d=s N

(

Rd,id

) 6=

1

.

(13)

The value of idis simply i when d

=

s. For the other cases of d, idis an index such that rd−1,id−1 crd,id. Obviously, the branch length of a branch root equals the number of nodes in the branch, and the branch length of each leaf node is 1.

Definition (Extended Branch Length). The extended branch length of nodeV

(

rs,i

)

is defined as Le

(

rs,i

) =

1

−

s

+

arg Nb max d=s N

(

Rd,id

) =

0

.

(14)

The values of idare defined as those in(13). Obviously,L

(

r

) =

Le

(

r

)

ifV

(

r

)

is a node in a leaf branch. The branch length of

a branch is defined as the branch length of its branch root.

3.6. Spot detection

This section introduces how to use a slice tree for spot detection. Humans recognize spots of a gel image by the size, shape and intensity variation of the spots. Region size and branch length are adopted to utilize a slice tree for spot detection. The region size is expressed as the number of pixels in the region. If a region belongs to a spot, then it should have a reasonable region size. Thus, the region size should be restricted to reduce the amount of image noise in spot detection. The branch length of each node in the slice tree corresponds to the intensity gradient of the spot in a gel image. A confident spot should have a larger branch length, but a faint spot has a smaller branch length in the slice tree.

More specifically, spot detection by a slice tree is performed by performing two recursive procedures FindSpotIn-Tree(rsr,i) and ProcessBranch(rs,j), whereV

(

rsr,i

)

denotes a branch root, while rs,jis either rsr,i or a descendant region of

(7)

Fig. 8. Results of spot detection using slice tree. (a) The detected spots were marked with red crosses. (b) Confidence of detected spots are shown in various

colors. (c) The mapping between confidence values and colors. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

to be recognized as a spot, and

w

tand htdenote the minimum width and height of a region that can be processed for spot

detection. These parameters can be set by users to control the sensitivity of the proposed method. If the parameter values are large, then only confident spots are detected. Conversely, even small and faint spots can be detected if the parameter values are small. However, if these parameters are set to be too small, then noise may be detected as spots. Notably, each pa-rameter is an integer, so its range is not too large to find its optimal value empirically. Moreover, the variation of papa-rameters only influences the detection of those small, and faint spots and do not matter to those confident spots. Thus, the detection results of the proposed method are not sensitive to parameters. Experimental results indicate that

w

t, htand d are best set

to 2, 2, and 3, respectively. The pseudocode for spot detection is outlined as follow. Procedure FindSpotInTree (rsr,i)

1. Call ProcessBranch(rsr,i).

2. If no spots are found in all child branches ofV

(

rsr,i

)

andLe

(

rsr,i

)

is greater than or equal to d, then a spot is found at

Ψ

(

rsr,i

)

.

Procedure ProcessBranch (rs,j)

1. If width of rs,j

≥

w

tand height of rs,j

≥

ht

then initiateL

(

rs,j

) =

1,

else initiateL

(

rs,j

) =

0.

2. If N

(

R_s_,_j

) =

1 then do

2.1 For rs+1,k

∈

Rs,j, Call ProcessBranch(rs+1,k).

2.2 SetL

(

rs,j

) =

L

(

rs,j

) +

L

(

rs+1,k

)

.

2.3 Goto step 4. 3. If N

(

Rs,j

) >

1 then

for all rs+1,k

∈

Rs,j, call FindSpotInTree(rs+1,k).

4. Le

(

rs,j

) =

L

(

rs,j

) +

maxr∈Rs,j

(

Le

(

r

))

.

The parameters rsr,ipassed to FindSpotInTree(rsr,i) are regions corresponding to branch roots. FindSpotInTree(rsr,i) calls ProcessBranch() to calculate the branch length ofV

(

rsr,i

)

and check the spot criteria for the node. IfV

(

rsr,i

)

belongs to a

leaf branch, and its branch length is greater than or equal to d, then a spot is found atΨ

(

rsr,i

)

. If no branch roots of sibling

branches satisfy the criteria, then shorter branches are pruned, and the longest branch is merged with the parent branch, which in turn is adopted for criteria testing. The pruning and merging procedure is repeated until a merged branch satisfies the criteria, or the root node is reached.

ProcessBranch(r_s_,_j) checks the region size of rs,j, and calculates the branch length for nodeV

(

rs,j

)

by recursively calling

itself with the child region as a parameter, until a non-solitary node is encountered. Those regions smaller than a specified size are eliminated when the branch length is calculated. If a manifold node is encountered, then FindSpotInTree() is called to check the spot criteria for child branches originating from the manifold node. Clearly, the branch roots of child branches have higher priority than branch root of its parent branch for being recognized as spots.Fig. 8(a) shows the results of spot detection for the gel image inFig. 7(a).

3.7. Confidence evaluation for spots

Since spots in the gel image have specific characteristics in the slice tree, their confidence can be calculated from the features of the corresponding regions. More specifically, the confidence values of spots are calculated from the slice tree by the following equation.

Cf

=

p

(α

l

)

2

₊

(β

_s

)

2

₊

(γ

_c

)

2

(8)

Fig. 9. Gel images fromhttp://www.deltastat.org/used for experiments. (a) 031403-ctrl2.tiff(1262×724)(b) 031403-ctrl3.tiff(1262×720)(c) 031403-ctrl4.tiff(1262×700).

where l, s and c denote the metrics for branch length, smoothness and compactness related to the spots, respectively, and

α

,

β

and

γ

denote their respective weighting factors. If spots are identified by the regions where the spots have been detected, then the metrics are defined as follows:

l

=

min

1

.

0

,

√

le np

,

(16) s

=

max

0

.

0

,

1

.

0

−

δ ×

nr nb

,

(17) c

=

min

1

.

0

,

2

√π

np nb

,

(18)

where ledenotes the extended branch length related to the spot; npdenotes the number of region pixels; nbdenotes the

number of border pixels of the region; nrdenotes the number of one-pixel-width knobs extended from the region, and

δ

denotes a constant factor. The metrics are normalized to the range from 0 to 1. Larger metrics lead to more confident spots being obtained.

The branch length metric is calculated as the ratio of branch length to region radius, which is estimated as the square root of region size, which in turn is defined as the number of region pixels. The smoothness metric is calculated as nrnormalized

by nb. The counter nrcan be determined during border tracing, since each one-pixel-width knob causes a 180◦direction

change. Obviously, a larger nrleads to a less smooth region border.

The compactness is a metric to measure the roundness of regions provided that circles have the highest compactness metrics of 1. The compactness of a region is calculated from the ratio of ideal border length and actual border length, as specified in(18). Since the area of a circle is calculated as A

=

π

r2_{, and the circumference of a circle is expressed as}

L

=

2

π

r

=

2

√

π

A, the ideal border length is calculated as L

=

2

√

π

A, where the region area A is approximated by the

number of region pixels, while the actual border length is approximated by the number of border pixels.Fig. 8(b) shows the detected spots for a gel image, with different colors based on their confidence values.Fig. 8(c) shows the mapping between confidence values and colors.

4. Experimental results and discussion

This section presents the experimental results using both synthetic and real gel images for qualitative and quantitative performance evaluation.

4.1. Experimental results using real gel images

Fig. 9shows the real gel images [13] adopted in the experiments. To indicate the advantages of spot detection using a slice tree, the results of the proposed spot detection were compared to those of four commercial software packages, namely Delta2D 3.2, Progenesis Discovery v.2005, Proteomweaver 3.0.1.11and ImageMaster Platinum 5.0. Most of the existing spot detection methods, including these four methods, adopt the Watershed [8] algorithm for spot segmentation. Spot models are adopted to eliminate segments not being fitted by the model after a gel image is segmented. Watershed is the most popular spot segmentation technique, but has the well-known problem of over-segmentation. Thus, the effectiveness of a spot model is essential to the results of spot detection based on Watershed. The first four rows ofFig. 10show the detection results of the four software packages on sub-blocks of 429

×

279 (the rectangles near the center of the image) from the real images ofFig. 9.

Unlike the four packages, the proposed spot detection using a slice tree does not rely on spot models. Instead, the branch length of each leaf branch corresponding to intensity difference between spots and background is adopted as the criterion of spot confidence. The fifth row ofFig. 10shows the results of the proposed method. In our results, spot centers are marked

(9)

Fig. 10. Comparison of spot detection. Rows for (1) Delta2D, (2) Progenesis, (3) Proteomweaver, (4) ImageMaster 2D Platinum and (5) Our method. (The

mapping between confidence values and colors are as specified inFig. 8(c).) Columns for (a) 031403-ctrl2.tiff, (b) 031403-ctrl3.tiff and (c) 031403-ctrl4.tiff.

with red crosses, and the boundaries of spots are shown in different colors according to their confidence values. Clearly, the boundaries of spots in the proposed method are compact and completely represent the real spot shapes. Additionally, only the proposed method provides a confidence value, giving useful information for subsequent matching.

More specifically, a block fromFig. 9(a) (the small rectangle at left side of image) is enlarged inFig. 11to indicate the difference between the detection results of the four commercial software packages and the proposed system. The comparison focuses on the two spots spreading horizontally over the image block. The gray levels of block images related to different methods appear to be different, since different contract enhancement may be adopted by respective methods. According toFig. 11(a), Delta2D detected the two spots correctly. However, the boundaries of the detected spots enclose extra areas. Progenesis detected each spot as 3–5 fragments, and the boundaries also enclosed extra areas, as indicated in

Fig. 11(b). According toFig. 11(c), although Proteomweaver detected the two spots correctly, the main parts of the spots were not enclosed, leading to invalid spot centroid positions. ImageMaster detected the boundaries of the spots correctly only if the fragments were merged together, as indicated inFig. 11(d). This is a well-known limitation, which results from the

(10)

over-Fig. 12. Analysis of spot and slice tree. (a) A saturated spot with boundary and spot center are shown. (b) 3D view and slice tree of (a). Region centers are

shown in circle points; the spot center is shown as a triangular mark. Projections of region centers and spot center are also shown in B48.

segmentation characteristic of Watershed. In contrast, the proposed method detected the two spots correctly, as indicated inFig. 11(e). Although the boundaries detected by our method did not cover the whole spots, their centroid positions were valid. The spot boundary was easily fitted by extracting the boundary of the parent region nearest to the background level.

The appearance of spots in a gel image depends on the quantity of the corresponding proteins, staining techniques and other neighbor proteins. Most spots may have specific sizes, shapes and recognizable intensity differences from the background. The slice tree around a spot can be roughly divided into three zones, background, steep and plateau in the layer direction, as indicated inFig. 12.

Neighboring pixels in the background area of the gel image had similar gray levels, and the local minimums were random distributed. Many regions appear in the binary images related to the background zone, and region centers were widely spread. Thus, the slice tree related to the background zone contained many horizontal links, but no dominant branch is presented.

In the area of a spot, the gray level of the neighboring pixels were approximately Gaussian distributed with a local minimum occurring at the spot center. The 3D view of the spot contained a noticeable gradient. The slices of the spot are the main regions of the binary images based on the steep zone, and the projections of region centers fall within a neighborhood. Thus, the slice tree related to the steep zone contains a dominant branch with few links existing, almost all of which are in the vertical direction.

A saturated gray level at the spot centers are common phenomena for large spots. Since neighboring pixels in a saturated area have a similar gray level, the saturated area has similar characteristics to the background, i.e. plentiful regions, horizontal links and no dominant branch.

Since spot detection by a slice tree is based on the branch length of the slice tree, the proposed method did not require background subtraction, and background inhomogeneity had no effect on the results of spot detection, as indicated inFig. 14. Thus, spot detection by a slice tree is simple and robust.Figs. 13and14show some complex cases to indicate the functionality of spot detection by a slice tree.

4.2. Experimental results using synthetic images

In the experiments, synthetic images were generated to evaluate the quantitative performance of spot detection. Source gel images of size 512

×

512 were generated first. Each image contained 100 spots in a variety of sizes, intensities and locations. Although it contained some overlapping spots, these were limited because excess overlapping is meaningless in gel analysis. Various degrees of noise were then added to the source gel images. A source gel image and the corresponding noise gel images was called a test set. In the first experiment, distortions modeled by a Thin-Plate Spline (TPS) were added to each source gel image to simulate an inhomogeneous background. In the second experiment, salt and pepper noise were added to each source gel image to evaluate the immunity of noise of spot detection. Since the spots in each synthetic image are known, the precision and recall rates can be calculated. Let TP denote the number of spots detected correctly, FN denote the number of spots missing detection, and FP denote the number of spots false alarmed. The precision rate (PR) is defined by TP

/(

TP

+

FP

)

, and recall rate (RR) is defined by TP

/(

TP

+

FN

)

. In our experiments, each gel image had 100 spots, thus

TP

+

FN

=

100. Detection results of the proposed method were compared to ImageMaster 2D Platinum 5.0 Trial version, since it was the only suitable package available.

(11)

Fig. 13. Results to show the performance of spot detection using a slice tree. (a) Faint spot. (d) Spot with overexposed background. (g) Spot with streak.

4.2.1. Synthetic gel image with inhomogeneous background

In this experiment, five test sets were generated to evaluate the quantitative performance of spot detection. Distortions modeled by a Thin-Plate Spline (TPS) were added to each source gel image to simulate an inhomogeneous background. The levels of distortion were controlled by the number of control points adopted in TPS modeling. For distortion level n, 50

×

n control points were randomly selected to spread uniformly among the source image. The gray value of each control point was then reduced by 5–10. Additionally, TPS regulation parameters of 0.0001 were adopted to produce a gradually changed background. This study adopted levels 1–5 in testing. Thus, the total number of distorted images was 5 (source images)

×

5 (distortion levels)

=

25.Fig. 15shows an example of synthetic images with an inhomogeneous background.Fig. 15(a) shows the source image, and inFigs. 15(b) and (c) show two distorted images.Figs. 15(d) and (e) show the results of histogram equalization of the distorted images for easy observation.

The proposed method and ImageMaster 2D were applied to these synthetic images, 5 source gel images and 25 distorted gel images. The parameters d

=

4,

w

t

=

ht

=

3 were adopted for the proposed method. The default parameters of

ImageMaster 2D were adopted: smooth

=

2, MinArea

=

5, and Saliency

=

1.0.Table 1shows the detection results of the proposed method and ImageMaster 2D. The first part of the table shows the detection results of the proposed method. Each row shows the average values of related items for the images within the same category. The first row corresponds to the category of the five source gel images. The other rows show the average values for categories with different distortion levels. The results inTable 1indicate that the proposed method had precision and recall rates over 99%, and were little influenced by distortion. The results in the second part ofTable 1show that ImageMaster achieved average recall rates of 100% for all categories. However, the numbers of false alarms rose based on the degree of distortion, which in turn decreased the precision rate. The proposed method has a slightly lower recall rate and obviously lower false alarms when compared to ImageMaster. Thus, the proposed method has greater immunity to the distortion of gel images than ImageMaster.

4.2.2. Synthetic gel images with salt and pepper noise

The second type of noise added to the gel images was salt and pepper. Five source synthetic gel images of size 512

×

512 were generated first. The degree of noise added to the source gel images was controlled by the density of noise pixels

ρ

. For each source gel image, five degree of noise with

ρ =

5%

,

10%

, . . . ,

25% were generated. The probabilities of salt and pepper

(12)

Fig. 14. Multi-spots. (a) Horizontal streak on three spots. (d) Vertical streak on two spots. (g) Eight spots with different intensity levels.

Table 1

Results of spot detection using synthetic gel images with inhomogeneous background.

Categories Detected TP FP FN PR (%) RR (%)

Proposed method

Source gel images 99.60 99.60 0.00 0.40 100.00 99.60

Distortion level 1 99.60 99.60 0.00 0.40 100.00 99.60 Distortion level 2 99.60 99.60 0.00 0.40 100.00 99.60 Distortion level 3 99.60 99.60 0.00 0.40 100.00 99.60 Distortion level 4 99.80 99.60 0.20 0.40 99.80 99.60 Distortion level 5 99.60 99.60 0.00 0.40 100.00 99.60 ImageMaster 2D Platinum

Source gel images 100.00 100.00 0.00 0.00 100.00 100.00

Distortion level 1 100.80 100.00 0.80 0.00 99.21 100.00

pixels are 50% and 50%, respectively.Fig. 16shows an example of synthetic images with salt and pepper noise.Fig. 16(a) shows the source image.Figs. 16(b) and (c) show two noise images with

ρ =

10% and

ρ =

25%, respectively.

The proposed method and ImageMaster were applied to these synthetic images with the same parameters as the previous experiment.Table 2shows the precision and recall rates of this experiment. The first part of the table shows the detection results of the proposed method without image preprocessing. Each row shows the average values of related items for the images within the same noise level. The first row corresponds to the detection results of the five source gel images. The other rows show the average values for the gel images with different noise levels. The results in the table indicate that the proposed method had 100% recall rates and precision rates from 66.9% to 100.0%. The second part of the table shows the detection results of ImageMaster 2D. ImageMaster achieved average recall rates from 97.4% to 100% and precision rates from

(13)

Fig. 15. Example of synthetic gel images with an inhomogeneous background. (a) Source synthetic gel image, (b) Synthetic gel image of distortion level 1,

(c) Synthetic gel image of distortion level 3. (d) and (e) are results of histogram equalization of (b) and (c), respectively.

Fig. 16. Example of synthetic gel images with salt and pepper noise. (a) Source synthetic gel image, (b) Synthetic gel image with 10% of salt and pepper

noise pixels, (c) Synthetic gel image with 25% of salt and pepper noise pixels.

17.99% to 100%, which was sensitive to the gel images with salt and pepper noise. The number of false alarms rose based on the degree of noise density, which in turn decreases the precision rate. Thus, the proposed method has greater immunity to the salt and pepper noise of gel images than ImageMaster.

5. Conclusion

A slice tree is effective for representing a gel image in a systematic organization. Nodes in the slice tree contain refined features about the spots and links between nodes contain corresponding characteristic expressions of the gel image. Thus, gel image analysis can be performed by analyzing the slice tree based on systematic organization. This study describes how

(14)

20 115.20 100.00 15.20 0.00 86.81 100.00 25 149.40 100.00 49.40 0.00 66.93 100.00 ImageMaster 2D Platinum 0 100.00 100.00 0.00 0.00 100.00 100.00 5 165.20 100.00 65.20 0.00 60.53 100.00 10 273.00 100.00 173.00 0.00 36.63 100.00 15 364.20 99.60 264.60 0.40 27.35 99.60 20 452.60 99.60 353.00 0.40 22.01 99.60 25 541.40 97.40 444.00 2.60 17.99 97.40

to detect spots by a slice tree. Future work will adopt a slice tree with confidence evaluation to provide information for other applications such as spot quantification and gel image registration by tree matching.

Acknowledgements

The authors would like to thank the National Science Council of the Republic of China, Taiwan for financially supporting this research under Contract No. NSC-94-2745-E-155-008-URD. The anonymous reviewers are commended for the valuable comments on the earlier version of this manuscript. Ted Knoy is appreciated for his editorial assistance.

References

[1] M. Berth, F.M. Moser, M. Kolbe, J. Bernhardt, The state of the art in the analysis of two-dimensional gel electrophoresis images, Appl. Microbiol. Biotechnol. 76 (Oct.) (2007) 1223–1243.

[2] T. Aittokallio, J. Salmi, T.A. Nyman, O.S. Nevalainen, Geometrical distortions in two-dimensional gels: Applicable correction methods, Journal of Chromatography B 815 (Feb.) (2005) 25–37.

[3] J. Salmi, T. Aittokallio, T.A. Nyman, O.S. Nevalainen, Correcting distortions in 2D-gels — A survey, Tech. Rep. 653, Turku Centre for Computer Science, 2004.

[4] A.W. Dowsey, M.J. Dunn, G.Z. Yang, The role of bioinformatics in two-dimensional gel electrophoresis, Proteomics 3 (Aug.) (2003) 1567–1596. [5] M. Quadroni, P. James, Proteomics and automation, Electrophoresis 20 (Apr.) (1999) 664–677.

[6] J.S. Gustafsson, A. Blomberg, M. Rudemo, Warping two-dimensional electrophoresis gel images to correct for geometric distortions of the spot pattern, Electrophoresis 23 (June) (2002) 1731–1744.

[7] R.D. Appel, J.R. Vargas, P.M. Palagi, D. Walther, D. Hochstrasser, Melanie II–a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms, Electrophoresis 18 (Dec.) (1997) 2735–2748.

[8] L. Vincent, P. Soille, Watersheds in digital spaces: An efficient algorithm based on immersion simulations, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (June) (1991) 583–598.

[9] T. Srinark, C. Kambhamettu, An image analysis suite for spot detection and spot matching in two-dimensional electrophoresis gels, Electrophoresis 29 (Jan.) (2008) 706–715.

[10] G.D. Boetticher, H. Al-Mubaid, K. Frasier-Scott, A recursive application of a support vector machine for proteion spot detection in 2-dimensional gel electrophoresis, Journal of Computer Science 1 (3) (2005) 355–362.

[11] K. Kriegel, I. Seefeldt, F. Hoffmann, C. Schultz, C. Wenk, V. Regitz-Zagrosek, H. Oswald, E. Fleck, An alternative approach to deal with geometric uncertainties in computer analysis of two-dimensional electrophoresis gels, Electrophoresis 21 (July) (2000) 2637–2640.

[12] M.C. Pietrogrande, N. Marchetti, F. Dondi, P.G. Righetti, Spot overlapping in two-dimensional polyacrylamide gel electrophoresis separations: A statistical study of complex protein maps, Electrophoresis 23 (Feb.) (2002) 283–291.

Spot detection for a 2-DE gel image using a slice tree with confidence evaluation

Mathematical and Computer Modelling