Image Analysis System for Protein Two Dimensional Gel Elelctrophoresis

(1)

(2)

(3)

(4)

(5)

(6)

Image Analysis System for Protein Two Dimensional Gel Elelctrophoresis

Juin-Lin Kuo

Department of Computer Science and Information Engineering Chung-Hua University

30 Tung Shiang ,Hsin Chu, Taiwan 30067 Email: [email protected]

July 30, 2003

(7)

List of Figures

1.1 2DGE image - ﬁrst dimensional separation by isoelectric fo-

cusing (IEF). . . 3

1.2 2DGE image - second dimensional separation by SDS-PAGE. . 3

1.3 An example of 2DGE image (original size:1498x1544). . . 4

1.4 System framework. . . 9

2.1 Piecewise bilinear mapping [10]. . . 12

2.2 Algorithm ﬂow of image registration [10]. . . 15

2.3 Final warped source image. . . 15

2.4 The image produced by the software. . . 16

2.5 Shortest path and geodesic distance. . . 18

2.6 SKIZ of a set Y in C. . . 19

2.7 Pixel distribution on gel image. . . 20

2.8 Relation of neighbouring pixel. . . 21

2.9 Finding watershed by directed form pixel diﬀerence. . . 22

2.10 Over-segmentation by Watershed Transformation. . . 22

2.11 Result of opening algorithm which removed most of the noise form Fig 2.10. . . 24

2.12 Illustration of region growing. . . 26

2.13 Result of spot detection in partial gel image from Fig 2.11. . . 27

3.1 Illustration of gel images combination [20]: (a) IPG 4-7 gel image; (b) IPG 4-5 gel image; (c) IPG 5-6 gel image; (d) IPG 5.5-6.7 gel image. . . 30

3.2 Illustration of graph theory: (a) Directed graph; (b) Undi- rected graph; (c) Straight-lined planar graph; (d) Non straight- lined planar graph. . . 32

3.3 Illustration of approximate graph: (a) MST graph; (b) RNG graph; (c) GG graph; (d) DT graph. . . 33

3.4 Illustration of Gabriel graph. . . 35

3.5 Illustration of relative neighborhood graph. . . 35

3.6 Illustration of major spots and satellite spots. . . 37

(10)

3.7 Illustration of maximum relation spanning tree matching pairs. 38

3.8 MRST’s node data structure. . . 39

3.9 MRST’s structure. . . 40

3.10 Illustration of adaptive membership function. . . 42

3.11 Phenomenon of diﬀerent membership functions. . . 43

3.12 Partial matching result. . . 44

3.13 Matching pair relation. . . 45

4.1 Result of directly matching processing. . . 49

4.2 Partial matching: using partial image to ﬁnd the original image information. . . 50

4.3 Partial matching and relocation processing. . . 53

4.4 Format of gel image’s attribute in database. . . 55

4.5 Information of gel images in 2DGE image analysis database system. . . 55

5.1 Overlap spot region. In protein spot No. 4, there may have several diﬀerent kind of protein. . . 59

(11)

List of Tables

4.1 The result of partial matching in diﬀerent image size without adaptive parameters. . . 51 4.2 The result of partial matching in diﬀerent image size with

adaptive parameters. . . 52 4.3 The result of relocation in diﬀerent situation without adaptive

parameters. . . 54

(12)

(13)

Abstract

In bio-informatics, proteomic technology plays one of the most important roles in protein characteristics studying and analyzing. The electrophoresis chromatography, the most popular analysis method, is currently able to express different kinds of proteins clearly. In this thesis, we provided a novel method to analyze the electrophoresis gel images more efficiently. The methodology proposed can be divided into two parts. Firstly, we processed the original gel images using watershed transformation and detected every spot in the gel images. Secondly, we compared and relocated the spots between different gel images. In order to make our system more reliable, we have developed a fuzzy inference system to verify the results of spot detection and spot matching. Therefore, all the integrated procedures composed as an intelligent electrophoresis gel image analysis system for future implementa- tion of a database.

Keywords : 2D gel electrophoresis, Image processing, Proteomics, Geometric graphs, Fuzzy inference system, Maximum relation spanning tree.

(14)

Abstract

In bio-informatics, proteomic technology plays one of the most important roles in protein characteristics studying and analyzing. The electrophoresis chromatography, the most popular analysis method, is currently able to express different kinds of proteins clearly. In this thesis, we provided a novel method to analyze the electrophoresis gel images more efficiently. The methodology proposed can be divided into two parts. Firstly, we processed the original gel images using watershed transformation and detected every spot in the gel images. Secondly, we compared and relocated the spots between different gel images. In order to make our system more reliable, we have developed a fuzzy inference system to verify the results of spot detection and spot matching. Therefore, all the integrated procedures composed as an intelligent electrophoresis gel image analysis system for future implementa- tion of a database.

Keywords : 2D gel electrophoresis, Image processing, Proteomics, Geometric graphs, Fuzzy inference system, Maximum relation spanning tree.

(15)

Chapter 1 Introduction and Literature Survey

1.1 Background

The structure of cells of all living organisms is basically constructed by ge- netic information. The DNA is a media to carry that information and can be performed by expressing of proteins. In order to gain more information from biological experiments, we always need to obtain the protein in particle and analyze the function of protein.

Protein function has been measured by many biologists by using diﬀerent techniques in many studies [1] . In biological viewpoint, proteins have some main functions:

• Controlling the reaction time in an organism,

• Resisting the microorganisms coming from external living body, and

• Modulating gene expression, etc.

(16)

There are still many functions about protein can provide very valuable technologies for life science research. Not only in biology science but also in medical science, the protein technology becomes more important.

Protein two-dimensional gel electrophoresis (2DGE) is based on Sodium do- decyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) which is a useful method for quantifying, comparing and characterizing proteins [2]. A strip of gel or a tube gel from isoelectric focusing (the ﬁrst dimension) is ﬁtted over an SDS-polyacryamide gel, and the proteins are separated according to molecular weight by electrophoresis. Pre-equilibration of the isoelectric focusing gel in SDS is necessary prior to running the second dimension by isoelectric point (PI). Therefor, we can sort the method into two steps:

• Isoelectric focusing (IEF), and

• SDS-polyacrylamide gel electrophoresis (SDS-PAGE).

In the first step, the proteins will focused base on their isoelectric point (IEF), as shown in Fig 1.1. After the status is stable, the second step is to separate proteins according to their molecular weights (see as Fig 1.2 ). Finally, different stain methods are applied to make different levels and different kinds of viewing, as shown in Fig 1.3.

(17)

Figure 1.1: 2DGE image - ﬁrst dimensional separation by isoelectric focusing (IEF).

Figure 1.2: 2DGE image - second dimensional separation by SDS-PAGE.

(18)

Figure 1.3: An example of 2DGE image (original size:1498x1544).

1.2 Motivation

2DGE is an important tool for investigating different patterns of qualitative and quantitative protein expression. Researchers in the biological field have a need of the automatic data analysis technique of detecting and recognizing the differences simultaneously from thousands of 2DGE images. The key of the comparison of 2D protein profiles lies in the fulfill of a fast and robust protein spot detection, spot matching, and a database system. Therefore, in this thesis we provided an analytic framework to deal with these processes.

We also provided a gel image relocation algorithm to handle the reversal, rotational and shiftable image and the partial mapping problem. The addi- tive value is the foundation of comparing small gel images to large format

(19)

gel images and searching method in a huge 2DGE image database.

1.3 Literature Survey: Two Dimensional Gel Electrophoresis Image Analysis Software Package and Related References

There are a few commercial or non-proﬁt software packages have been developed for 2DGE image analysis. They oﬀer various technologies and user interfaces for biologists. We will introduce these software packages from the following viewpoints:

• Registration vs. non-registration: Z3 [3] vs. CAROL [4].

• Database included vs. database non-included: Melanie [5] vs. Image- Master 2D Elite [6].

• Web-based vs. non web-based: SWISS 2D PAGE [7] vs. DBGel [8].

In gel comparison process, there are two different methods focus on image variable and spots distribution. The image registration, a novel technology which first applied in commercial software- Z3 [3], can restore the difference of gel images by using image correlation. On the other hand, the graph theory was used to deal with the comparison of disordered protein spots, such as CAROL [4]. These two methods have their own advantages and disadvan- tages. By using the image registration method, we may find the difference between two gel images by overlapping them in two different colors, but we

(20)

will blemish one. Alternatively, the non-registration method will overcome the drawback but may be failed when comparing non-structure protein spots.

After the comparison step, database is a necessary tool to store gel information. Most software packages does not provide database system, like the ImageMaster 2D Elite [6]. In order to construct a powerful analysis system, the database is then builded, e.g. Melanie [5].

Although these packages have diﬀerent applications, they all can deal with 2DGE images well from their standpoint. In the consideration of integrating these software packages attributes, we have developed a 2DGE images analysis system in a novel way.

Issues of 2DGE images analysis have been discussed in some literatures. The problems can be characterized into the following topics:

• Image registration,

• Spot detection,

• Spot matching,

• Spot analysis, and

• Database processing.

Image registration is widely used in biomedical imaging, which includes methods developed for automated image labelling and pathology detection in indi- viduals and groups [9]. Image registration is equally important to biological

(21)

systems, e.g. in proteomic research, 2DGE images is an important tool for investigating differential patterns of qualitative protein expression [10] discussed in chapter 2. Spot detection is a basic procedure for 2DGE images analysis. We must locate the protein spots coordinates and then record or compare the attributes. There are many methods for detecting the protein spots: Gaussian fitting [11], Laplacian fitting [3], Histogram [12] and Water- shed Transformation [13, 14]. The common features of those algorithms are based on spatial domain single processing [15]. The aim of the segmentation process is to define the location, true boundary and intensity for each spot.

Spot matching is a difficult problem in gel comparison. We must find the different spot between standard gel and relative gel. There were some algorithms have been proposed to solve this problem, for example Restriction Landmark Genomic Scanning (RLGS) [11, 16, 17] and Fuzzy Cluster [18].

RLGS compares all the protein based on constructing of computer graphs and landmark. Using the landmark to initialize the comparison process. An- other method - Fuzzy Cluster, using the relation between two protein spots and calculates their similarity. The more similar spots pair, the more close they are. Using this method to compare similarity of two protein spot sets.

We also included some of the methods in the proposed system. For spot detection, we chose the Watershed Algorithm to segment the protein spots.

The reason we use Watershed Algorithm is because we need to segment some areas with low intensity to process fast and to adapt to local area intensity.

(22)

Other algorithms were not adequate to meet our requirement such as His- togram Equalization [12] may have problem in segmenting protein spots in diﬀerent intensity area. On the other hand, although the RLGS method is directly perceived through the senses, it has the drawback that users need to allocate the landmark manually. Fuzzy Cluster method also can calculate the similarity fast, but if there have hundreds or thousands protein spots in two gel images, this method will fail without using many features. Therefor, we will provide a novel way to overcome this problem - Maximum Relation Spanning Tree (MRST). The algorithm not only can handle the rotation, shift and reverse condition, but also can handle partial mapping problem. In the matching process, we apply fuzzy inference to correct the deviation, which generated in rotation, shift and reverse condition. In addition, this method doesn’t need landmark allocated in a priori by user. Finally, Spot analysis and results and protein spots information are recorded into a database for further research.

1.4 Methodology

The framework of the proposed 2DGE images analysis system is schematically plotted in Fig 1.4. First, there will deal with image registration and decide whether it is needed to correct small deformation between two gels.

Next, the spot detection and protein spots labelling function are implemented. Thirdly, the results of spot detection are used to proceed with

(23)

Figure 1.4: System framework.

(24)

spot matching and ﬁnd the relationship between spots in diﬀerent gels. In this thesis, we provide a novel content-based method - Maximum Relation Spanning Tree (MRST) to compare two gels. This method can not only compare two gels, but also can relocate the spots from partial image to full image. Finally, the analytic results will be recorded and fed into the database for further research.

(25)

Chapter 2 Image Registration and Spots Detection

Image registration and spots detection are important procedure in the whole 2D gel image analysis scenario, in which image registration is a common issue and has been studied thoroughly in the past. In this chapter, we will focus on the problem of protein spots detection and just adopt the image registration procedure provided by Stefan Veeser [10] as a pre-processing stage.

2.1 Multiresolution Image Registration

This section is to present an improved image registration technique based on image warping theory [19]. This method adopts a coarse-to-ﬁne feature- matching paradigm such that combinatorial complexity of the possible matches between a large numbers of local patterns in each gel is avoided.

(26)

Figure 2.1: Piecewise bilinear mapping [10].

2.1.1 Methodology

Suppose we want to compare two images S and T_n where S is a standard gel image and T_n is one of target image with number n consists of ﬁnding a transformation f_tran, such that the similarity analysis function Sim between S and f_tran(T_n) achieve the desired goal. The correlation function is deﬁned as:

Sim = corr(S, f_tran(T_n)). (2.1)

We have chosen to use transformations deﬁned by Piecewise Bilinear Maps (PBM) to represent the associated deformation. A PBM consists of a lattice of maps illustrated in Fig 2.1. In order to build this lattice, the target image is ﬁrst partitioned into 2^l× 2^l regular squares, where l is the level of detail for the corresponding transformation. For a given level and given lat- tice index (i, j), the mapped points s_i,j, s_i+1,j, s_i,j+1, s_i+1,j+1 of the source

(27)

image and the control points t_i,j, t_i+1,j, t_i,j+1, t_i+1,j+1, of the target im- age will deﬁne the mapping function m. A point c, which lies in a square t_i,j, t_i+1,j, t_i,j+1, t_i+1,j+1, in the target image is mapped according to the source point m(c) following weighted sum of the corresponding control points.

m(c) = w_{i, j}(a, b)s_{i, j}+ w_{i+1, j}(a, b)s_{i+1, j}+ w_{i, j+1}(a, b)s_{i, j+1}

+w_{i+1, j+1}(a, b)s_{i+1, j+1} (2.2)

where w_{i, j}(a, b) = (1− a)(1 − b), wi+1, j(a, b) = a(1− b), wi, j+1(a, b) = (1− a)b, wi+1, j+1(a, b) = ab. The values of a and b are the horizontal and vertical ratios of point c, as indicated in Fig 2.1.

2.1.2 Similarity Measure

Two dimensional gel images prepared from similar samples in two diﬀerent conditions might vary in contrast as well as in brightness, which is equivalent to the changes in mean intensity and variance. A sensible similarity measure should be invariant to these variations. The cross-correlation between two intensity distributions is deﬁned as partial derivative σ(S) and σ(T ) of cov(S, T )

corr(S, T ) = cov(S, T )

σ(S) σ(T ) (2.3)

cov(S, T ) = 1

|D|D

(S(x)− S)(T (x) − T )dx (2.4)

(28)

where corr() is a correlation function which calculate the similarity between two images and the cov() which calculate the covariance of two images.

I = 1

|D|D

I(x)dx (2.5)

Let I be the mean of image pixel value.

σ²(I) = 1

|D|D

(I(x)− S)²dx (2.6)

In the above equations, D ⊂ R² is the domain of points to be considered for the registration process and the images I (S, T ) are represented as function I : D → 0, ..., N, which give an intensity value I(p) for each point p in D.

R² denotes the two-dimensional Euclidian space. σ²(I) denotes the variance of images pixel value.

2.1.3 Algorithm Flow Path

The flow chart of algorithm is shown in Fig 2.2. It starts with a rough level of detail and find the maximum similar block in these grids in the target image. When optimizing the correlation function (Equation (3.3)), the level of detail will be increased as needed in target image. After this algorithm has been processed for five times, we can get the result shown in Fig 2.3. In which the target image is printed in purple and the source image is plotted in green color.

(29)

Figure 2.2: Algorithm ﬂow of image registration [10].

Figure 2.3: Final warped source image.

(30)

Figure 2.4: The image produced by the software.

2.2 Spot Mapping Using the Result

After registration process, the source image will be warped and the image size will be the same with the target image as shown in Fig 2.3 where the white spot are overlapped area between S and T. We can then process the spot detection and direct matching to compare the protein spots between two gel images. We will discuss this compared process in Chapter 3.

2.3 Mathematical Morphology for Spots De- tection

Mathematical morphology is a very useful technique of image segmentation.

Considering with the characteristics of gel images, we used a watershed trans-

(31)

formation algorithm, which is a mathematical morphology methods [15] as the major role in the proposed spot detection method. Watershed Transfor- mation has been used commonly. For example, in gray level images, one of the most important attributes is the morphological gradient. The operation will be deﬁned as:

g(f ) = (f ⊕ B) − (f B) (2.7)

where f ⊕ B and f B denote the elementary dilation and erosion, re- spectively. We will describe the watershed transformation algorithm in the following sections.

2.3.1 Watershed Transformation

We can treat the gray level images as rugged topography. This topography map contains a lot of contour lines and we can find the minimum and the maximum gray level pixels in the topography. In one contour cycle C as shown on the right hand side of Fig 2.5, we can search two points x and y which have finite distance between each other if x and y reside within the contour C. On the other hand, if two points are located in different contour cycles, the distance is defined as infinite, such as x and z. Between any two points, the distance can be expressed as

d(x, y) = (C_xy), d(x, z) = +∞. (2.8) Hence, the topography map can be transformed as a hierarchical gray level image. If point sets in some contour cycle sets Cs, as show in Fig 2.6, there

(32)

Figure 2.5: Shortest path and geodesic distance.

are three different regions reside in different contours. Then we define the point sets as

R_C(Y ) ={x ∈ C : ∃y ∈ Y, dC(x, y) is f inite} (2.9)

R_C(Y) is called the C-reconstructed set by the marker set Y. The area with the center R_C(Y) is the set of points of X at a ﬁnite distance from R_C(Y_i) to R_C(Y_j). We deﬁne the area as:

Z_C(Y ) = {x ∈ C : d_C(x, y) is f inite and∀ j = i, (2.10)

d_C(x, Y_i) < d_C(x, Y_j)} (2.11)

After we define the areas in C, we can find lots of Z_C(Y ). Depending on the definition, we can find the skeleton by zones of influence of Y in C. We shall write:

IZ_C(Y ) =

i

Z_C(Y_i) , and (2.12)

(33)

Figure 2.6: SKIZ of a set Y in C.

SKIZ_C(Y ) = C / IZ_C(Y ). (2.13)

where / stands for the set diﬀerence.

After the computation from equation (2.12) and (2.13), we will obtain the sets of point as local minimum or maximum. Thus, we can use the following deﬁnition to deal with the full gel image:

j

IZ_C(Y_j) =

j

i

Z_C(Y_i) , and (2.14)

SKIZ_C(Y_j) = C / IZ_C(Y_j). (2.15)

2.3.2 Improvement of Watershed Transformation

The traditional Watershed Transformations can be divided in two categories [14]:

• Simulating the ﬂooding process, and

(34)

Figure 2.7: Pixel distribution on gel image.

• Making of procedures aiming at the direct detection of the watershed.

The method illustrated in Section 4.1.1 belongs to the ﬁrst category. In this section, we will combine the second group of algorithms to improve the Watershed Transformation.

First, we handle the gel image by pixel distribution conditions. We illustrate the situation in Fig 2.7 . The lower intensity pixel value is labelled with smaller numerical number; on the other hand, larger numbers denote higher intensity pixels.

Next, we calculate the intensity relation between the neighbouring points intensity. If the intensity is smaller than its neighbour points, the arrow will be assigned to its neighbor. Otherwise, the arrow will be assigned to itself.

No arrow will be assigned between two pixels if they are with same intensity.

Fig 2.8 is result from Fig 2.7 for arrows assignment.

(35)

Figure 2.8: Relation of neighbouring pixel.

Thirdly, according to the result of the second step (shown in Fig 2.8), we can ﬁnd two independent areas in the image. Therefor, we will set a level as a threshold to extend until a demarcation line is found as the watershed.

The watershed area is plotted in gray color in Fig 2.9 .

2.3.3 Over-segmentation Problem

The Watershed Transformation method can segment objects from the com- plex background, however the most serious problem is over-segmentation as show in Fig 2.10 where in-distinguish areas are segmented. To overcome this drawback, we utilized the ’opening’ algorithm by combining dilation and erosion in the proposed system. The algorithm is discussed as the following:

(36)

Figure 2.9: Finding watershed by directed form pixel diﬀerence.

Figure 2.10: Over-segmentation by Watershed Transformation.

(37)

Dilation

Let A and B be sets in Z², the dilation of A by B_x is denoted as A⊕ Bx

which is deﬁned as

A⊕ B = {z|(B) _z∩ A = ∅}. (2.16) (A)_x = {c|c = a + x, a ∈ A}. (2.17) B = {x|x = −b, b ∈ B}. (2.18)

Equation 4.10 is based on obtaining the reﬂection of B about its origin and shifting this reﬂection by z. The dilation of A by B then is the set of all displacements z, such that B and A overlap by at least one element.

Erosion

For sets A and B in Z², the erosion of A by B_x denoted as A B, deﬁned as

A B = {z|(B)z ⊆ A} (2.19)

In other words, this equation indicates that the erosion of A by B is the set of all points z such that B_x, translated by z, is contained in A.

Opening

Opening generally smoothes the contour of an object, breaks narrow isth- muses, and eliminates thin protrusions. The opening of set A by structuring element B, denoted by A◦ B, is deﬁned as

A◦ B = (A B) ⊕ B. (2.20)

(38)

Figure 2.11: Result of opening algorithm which removed most of the noise form Fig 2.10.

After opening algorithm is applied, the over-segmentation image can be improved. One of the results is shown in Fig 2.11. Most importantly, we can select the detail of ’opening level’ which removes the small areas to decide the amount of spots.

2.4 Protein Spots Labelling

In order to record the protein spots attributes, we need to label different protein spots by different numbers. We will use the ’region growing’ algorithm to label protein spots. Region growing is a recursive algorithm, so that we can find the relative pixels in the same area and measure the size of spot area. The algorithm will be described in the following sections:

(39)

2.4.1 Region Growing

Many protein spots are detected in one 2DGE image and every spot has diﬀerent shape. Therefor, considering a robust labelling algorithm to make label for every protein spot and identify if they are belong to the same area.

Region growing is a powerful algorithm to make label various kinds of shapes.

We have implemented the algorithm recursively until all pixels in the same area have been labelled. The algorithm is illustrated as following:

Region growing() {

if (there is no pixel in the same condition) break;

else

Labelling the pixel;

Region growing();

}

The region growing algorithm is illustrated schematically in Fig 2.12. The point will travel all pixels in gel image and ﬁnd the same area.

After the region growing algorithm process, we can record the protein spots feature information such as:

• Number of spots,

• Size of spot area, and

(40)

Figure 2.12: Illustration of region growing.

• Spot coordinates.

We will use these features to process in spot matching and store them into database later.

2.4.2 Ellipse Fitting and Exclusion

In order to understand the composition of protein spots, the biologist need to hollow out the protein spots and identify them by some proteomic approaches (such as mass spectrometry). We need to provide a tool for locating the manipulator. The ellipse fitting is necessary. From the procedure introduced in the previous section, we have obtained the spot coordinates. Then, we define the ellipse filter as:

R(x, y) ={(u, v)|(u − x)²+ (v− y)²/α² < r²}. (2.21)

where R represents the ellipse region, u and v denote the coordinate of the pixel, (x, y) is the center coordinate of ellipse and r is the radius, α denote

(41)

Figure 2.13: Result of spot detection in partial gel image from Fig 2.11.

the ratio of the major axis and the minor axis of the ellipse. The ellipse should cover the spot area and deﬁne the center of spot. Fig 2.13 shows the spot detection result. The gel image contains 48 protein spots included by 48 Ellipses labelled by the corresponding numbers.

Although there are thousands of spots in a 2DGE images, biologists are interested in diﬀerent level of detail of protein spots. Therefore, we can set a threshold to exclude unnecessary out spots and keep the interesting spots of interest for biologists.

(42)

Chapter 3 Spot Matching

Although we have found the protein spots eﬃciently from the previous procedures, the comparison process is still a challenging task. We have implemented two diﬀerent methods Gabriel graph and relative neighborhood graph and will be discussed them separately in the following sections.

3.1 Direct Matching

In Chapter 2, we have integrated the registration method proposed by Stefan Veeser [10] to process the gel image registration. The result will be further used to compare gel images in this section.

When we ﬁnished the gel image registration process, the most similar protein spots will be found and overlap in the same coordinate with the standard image. Therefore, if a protein spot in the standard gel image matched by another protein spot in relative gel image, we then will use the Euclidean distance to ﬁnd the most similar one. Let P_s be one of the protein spots in the standard gel image, m(P_s) be the matching function as Euclidean

(43)

distance, P_r be matched protein spot in relative gel image, they are judged the by following expression:

if P_s maps P_r ⇔ m(P_s) is the shortest distance than any

other protein spots will be obtained. (3.1)

In this way, the most similar protein spot pairs between two diﬀerent gel images will be matched to each other.

3.2 Partial Matching and Relocation

In this section, we focus on the partial matching and relocation in large associated gel images, which is a new aspect in proteomic study (or proteomics) . In the early stages, the biological experiment with long range pH gradient, the protein spots were usually too closed or overlap with each other. This is a major problem for purging protein. Therefore, the biologists would pre- fer experimenting with narrow range of pH gradient in the same gel areas separately and then compose several gel sub-images to analyze protein as a whole. As we can see in Fig 3.1, the top image is composed from three partial overlapped images shown below it. We can use the combined gel image to ﬁnd more useful protein spots instead of the overlapped and blurred protein spot in wide range IPG gel image. For this constraint, we provide a novel comparative method for matching the small part of image to it’s original gel image. Thus, we can use this method to ﬁnd the original gel image form small and partial image and relocate them by using this protocol.

(44)

Figure 3.1: Illustration of gel images combination [20]: (a) IPG 4-7 gel image;

(b) IPG 4-5 gel image; (c) IPG 5-6 gel image; (d) IPG 5.5-6.7 gel image.

(45)

Generally speaking, protein spots don’t have fix model nor fix size and could distribute everywhere in the gel image. We cannot compare gels only by protein spots or by image intensity. According to the deficient feature in gel image, we have to construct the unique feature for the protein spots. In this thesis, we apply the computer graphics theory to construct features and to extract the features for gel images comparison [21, 22, 23, 24, 25]. We will discuss these issues in the following sections.

3.2.1 Gabriel Graph and Relative Neighborhood Graph

Many points and edges compose a graph. If the distance is far between each other, the edge is large; otherwise, the edge is small. By the point of connected relationship, we can define many different kinds of graphs. First, we can differentiate between directed graphs and undirected graphs from the directions of edges, as shown in Fig 3.2(a) and Fig 3.2(b), respectively. We also can differentiate between straight-lined planar graphs and non straight- lined planar graphs from edge’s connected method, examples can be seen in Fig 3.2(c) and Fig 3.2(d), respectively. Therefore, we can build proximity graphs by utilizing these concepts.

There have been several kind of graph computation methods studied by researchers and scientists such as minimum spanning tree (MST) [26], relative neighborhood graph (RNG) [27, 28], Gabriel graph (GG) [29], Delauney triangular graph (DT) [30] and etc, examples are shown in Fig 3.3 [31].

(46)

Figure 3.2: Illustration of graph theory: (a) Directed graph; (b) Undirected graph; (c) Straight-lined planar graph; (d) Non straight-lined planar graph.

These graphs are simple, undirected, straight-lined, connected and planar graph. Furthermore, these graphs will keep features unchanged no matter the transformation of rotation, shift and reverse. Therefore, every point in the graph has a set of unique feature from graph viewpoint.

We have selected the Gabriel graph and the relative neighborhood graph as feature constructive models because the variation of point’s feature is more obvious than the others by utilizing these models. Since those four graph models possess the following relation.

M ST ⊆ RNG ⊆ GG ⊆ DT (3.2)

We use two graph models GG and RNG, which construct the same features to compare gel image. GG provides the major connection relation and RNG

(47)

Figure 3.3: Illustration of approximate graph: (a) MST graph; (b) RNG graph; (c) GG graph; (d) DT graph.

provides the minor. We begin to deﬁne some of the graphs theoretical and geometric terminologies as in the next paragraph.

A graph G = (V, E) consists of a ﬁnite non empty set V (G) of vertices, and a set E(G) of unordered pairs of vertices known as edges. An edge e ∈ E(G) consisting of vertices u and v and is denoted by e = uv ; u and v are called the endpoints of e and are said to be adjacent vertices or neighbors. The degree of a vertex v ∈ V (G), denoted by degG(V ), is the number of edges of E(G) which have v as an endpoint. A path in a graph G is a ﬁnite non-null sequence P = v₁v₂...v_k where the vertices v₁v₂...v_k are distinct and v_iv_i+1 is an edge for each i = 1...k− 1. The vertices v1 and v_k are known as the

(48)

endpoints of the path. A cycle is a path whose endpoints are the same. A graph is connected if, for each pair of vertices u, v ∈ V , there is a path from u to v .

In this way, we can deﬁne the following graph - Gabriel graph and relative neighborhood graph.

Gabriel Graph

The Gabriel graph P , denoted by GG(P ) , has its region of inﬂuence the closed disk having segment uv as diameter. That is, two vertices u, v ∈ S are adjacent if and only if

D²(u, v) < D²(u, w) + D²(v, w), f or all w∈ S, w = u, v. (3.3)

Illustration of Gabriel graph is shown in Fig 3.4.

Relative Neighborhood Graph

Given a set P of points in R² , the relative neighborhood graph of P , denoted by RN G(P ), has a segment between points u and v in P if the intersection of the open disks of radius D(u, v) centered at u and v is empty. This region of inﬂuence is referred to as the lune of u and v . Equivalently, u, v ∈ S are adjacent if and only if

D(u, v)≤ max[D(u, w), D(v, w)], for all w ∈ S, w = u, v. (3.4)

Illustration of Relative neighborhood graph is shown in Fig 3.5.

(49)

Figure 3.4: Illustration of Gabriel graph.

Figure 3.5: Illustration of relative neighborhood graph.

(50)

3.2.2 Feature Extraction

After we have constructed the proximity graphs, we will continue to extract the features for the spots on protein gel images. In these proximity graphs, we can obtain the following features.

• Degree of protein spots,

• Angle of connected edges, and

• Distance between protein spots.

These features will not be changed in the circumstance of shift, rotation and reverse due to the advantage of graph properties. We will calculate the features for every protein spot based on these information and utilize them as a framework of comparison in the next section.

3.3 Maximum Relation Spanning Tree

In order to compare the similarity between two gel images, we have developed a comparative framework for this task. The maximum relation spanning tree (MRST) that we replace the minimum distance as maximum relation between two spots derived from minimum spanning tree algorithm. We will calculate the relationship points as their features and ﬁnd the maximum relation protein spot pair as basic information for image matching. If we cannot ﬁnd any referable pair in the spot pair sets, the algorithm will be terminated. The illustration of the MRST algorithm is shown in Fig 3.6 and

(51)

Figure 3.6: Illustration of major spots and satellite spots.

Fig 3.7 and is discussed as follows.

The comparison method of this MRST algorithm is based on Gabriel graph.

We use the connected condition of Gabriel graph to deﬁne whether spot can join and compare as shown in Fig 3.6 . (a) and (b) denoted two protein spots sets. (c) and (d) denoted the initial comparative pair found (plotted in red dots) by global matching. (e) and (f) denoted the satellite spots found (plotted in blue dots) by Gabriel graph connected condition. We name the red spot as the major spot and the blue spots as satellite spots. Starting from any two major spots in the standard gel image and relative gel image, we will travel through all of the satellite spots in the standard gel image and calculate the relationship with all of the satellite spots in relative gel image.

As illustrated in Fig 3.7 with connected edges, we will ﬁnd the maximum

(52)

Figure 3.7: Illustration of maximum relation spanning tree matching pairs.

relation for every satellite spot pairs recursively. (a) and (b) denoted that finding the maximum relation matching pair between two images. (c) and (d) denoted that using the satellite spots as the center point and find the next pair. (e) and (f) denoted that recursive processing and find the next pair From these pairs, we chose the maximum relation pair to be the major spot pair for comparison later and store the rest of high relation pairs into the temporary stack. If we cannot find further major spot pair, choose the largest relation pair in the temporary stack and process continuously. When all of the spots in the standard gel image have been travelled, the computation will be terminated and the matching results will be obtained.

In the following sections we will explain the details for the proposed MRST’s data structure and it’s pseudo code.

(53)

Figure 3.8: MRST’s node data structure.

3.3.1 Data Structure

The maximum relation spanning tree is combined by many nodes. The nodes are labelled by their corresponding protein spot number. As listed in Fig 3.8 , we record the information of the node in the following order: node mother (MN), node number (NAME), matched spot in relative gel image (target node, TN), linked spot number by Gabriel graph (LN), linking spot status (STATUS) and linked spots information (LN_n) as the temporary stack. By using these deﬁnitions, we can consult every spot, matching spot and their neighbor spot information conveniently. The information includes spot number, location and matching relation.

Using the above deﬁnition, we can build a unique tree by the proposed comparison method. Fig 3.9 shows a maximum relation spanning tree, the nodes refer to their mother nodes and children. Therefore, we can ﬁnd the information of protein spots on this tree quickly, especially for their neighbor spots.

(54)

Figure 3.9: MRST’s structure.

3.3.2 MRST Algorithm

When we implement this algorithm, we separate the process into two parts:

global matching and local matching.

Global matching

In this step, we need to find the initial comparative pair. We compare all possible pairs to find the maximum relation between each other. We compute the Euclidean distance of three different features (spot’s degree, edge’s distance and angle) as the similarity measure between spot pairs and then sort them by their relation values, then record the spot pairs into the temporary stack.

(55)

Local matching

After we have completed the searching of the similar pairs, we start to process the maximum relation spanning tree. The algorithm is indicated as following:

M RST () {

If node tree is null.

Inserting new comparative pair.

M RST ()

else if comparative pairs is not empty

F inding next comparative pair in satellite spots.

M RST ()

else if temporary stack is empty F unction will be terminated.

}

3.4 Fuzzy Inference System

When calculating the relations of local spot sets, we must consider three different features and combine them. Using the relation to compare the satellites spots and sort them. In addition, we also adapt the relation function by the property of local area, because we design the threshold of features diﬃcultly.

(56)

Figure 3.10: Illustration of adaptive membership function.

By these reasons, we decide applying the fuzzy inference to develop our comparative framework. The inference framework can instead the crisp relation into fuzzy relation. Using fuzzy relation to compare satellite spots is more suitable than to use crisp relation.

Due to the diﬀerence in every local area, we use a stylized membership func- tion. We deﬁne the feature of the spots in the standard gel image as f^s and the feature of the spots in relative gel image as f^r. Let the function be:

R(f_n^S^a^→R^b) = e^−[^(fSa^{n −f}

n )Rb

2σ2 ]² (3.5)

Where R denotes the membership function and σ denotes the variance of the feature f_n between spots. The function is illustrated in Fig 3.10 σ₁ and σ₂ based on different local intensity. In this figure, we can see that different sets of spot will have different kind of membership constructed by differ- ent σ. With different feature conditions, we also have different membership functions. The phenomenon is illustrated in Fig 3.11 which different major

(57)

Figure 3.11: Phenomenon of diﬀerent membership functions.

point’s feature value have different relation function.. The comparative algorithm MRST utilizes this fuzzy inference system to compute the relations of every spots whether it is neighbor of the center spot. Because we have three different features, we calculate three different relations for these features and calculate their average value [32]. The total relation R_total is defined as:

R_total= ω_f₁ · Rf1 + ω_f₂ · Rf2 + ω_f₁ · Rf3

3 (3.6)

The w is the weight of the feature. Finally, we can choose the maximum relationship from the spot pairs and process to next comparison state.

This algorithm will process recursively until all of the spot pairs produced by Gabriel matching completed. Through this process, we will ﬁnd all similar spot pairs between two gel images. We will also label the matched spots to their relative spots and their matching area. As shown in Fig 3.12. The left window displays the source image to be matched, the square on the right- hand side indicates the matched area in a large scale gel image. If we fail in a complete partial match, this algorithm could present the possible match point pairs with corresponding matching labels as shown in Fig 3.13.

(58)

Figure 3.12: Partial matching result.

As we can observe in Fig 3.7, each spot in the rectangle area has two labels indicating the original label (left) of the target gel image and its matched spot label in the source image (right one followed after arrow).

(59)

Figure 3.13: Matching pair relation.

(60)

Chapter 4 Simulation Results and Database System

We have implemented the proposed two dimensional electrophoresis protein gel image analysis system, and will demonstrate the results of gel matching (global matching and partial matching) in this section. We have obtained 15 gel images from the Animal Technology Institute Taiwan (ATIT) and web sites as test images. We utilized them to construct our data set. The data set contains totally 240 gel images as following:

• 15 original gel images.

• 75 rotated images:

– 45 gel images obtained from the original gel images by rotating in 90^◦, 180^◦, and 270^◦ degrees, respectively.

– 30 gel images obtained from the original gel images by ﬂipping horizontally and vertically, respectively.

(61)

• 150 partial gel images.

– Ten diﬀerent sizes of partial images (1000x1000, 900x900, 800x800,700x700, 600x600, 500x500, 400x400, 300x300, 200x200 and 100x100) chopped randomly from each of 15 original gel images.

We have processed them by the following conditions:

• Directed matching: using the original gel image pairs (we have seven diﬀerent pairs with 15 gel images).

• Partial matching: using the small samples (150 pieces) to match the original gel images (15 pieces).

• Relocation: using the rotated samples (75 pieces) to match the original gel images (15 pieces).

Through the extensive computer simulation, we have obtained the results and presented in the following sections:

4.1 Results of Spot Matching

As discussed in the previous section, spot matching have three diﬀerent matching types - directed matching, partial matching and relocation. In directed matching, we use Euclidean distance to compare and it spends less than 1 second for hundreds of protein spots. Using MRST algorithm in partial matching and relocation, we also have good performance in processing

(62)

time with less than 2 seconds. In the following section, we will discuss these spot matching process and their performance, separately.

4.1.1 Results of Directed Matching

We utilized the directed matching to deal with similar gel image pairs. As illustrated in Chapter 2, we applied the image registration algorithm to correct the diﬀerence between two similar gel images. We then used the results to process spot detection in gel images, Euclidean distance is applied to ﬁnd the most similar spot pair. In Fig 4.1 which left image is standard gel image, right image is relative gel image, we label the similar protein spot with numbers from standard gel image and relative gel image. After we have labelled the similar relation, the biologists can have a clear reference to make a chemical examination to retrieve the protein information.

4.1.2 Results of Partial Matching and Relocation

In this thesis, we provide a novel method to perform gel image matching from a small gel image to large gel image, as shown in Fig 4.2. We segmented the original image into ten diﬀerent sizes and use those small gel images as the standard gel image to match the original gel image. In Table 4.1, we show the results of partial matching with ﬁxed parameter values. The ratio of correct matching is about 67.3%. In order to obtain better performance, we tried to adjust two parameters - level of ’opening’ and protein size threshold.

The ’opening’ parameter ranges is from 1 to 30, and the protein size thresh-

(63)

Figure 4.1: Result of directly matching processing.

(64)

Figure 4.2: Partial matching: using partial image to ﬁnd the original image information.

(65)

old is between 0 and 1000. If the gel protein spots intensity in gel image is blurred, we must adjust the parameters to low enough to retain the small protein spots for matching. On the other hand, if the image background is over-stained, we need to adjust the parameters as high as possible in order to delete the noise. The matching performance is raised to 85.5% after the adjustment. We show the result in Table 4.2, we use 165 diﬀerent gel image size to process partial matching.

In order to simulate the situations for rotation, reverse, shift, we have 75

Table 4.1: The result of partial matching in diﬀerent image size without adaptive parameters.

Amount of Correct/False Image Correct/False Ratio(%) Gel Image Correct Number False Number Correct Ratio False Ratio

Original Image 15 0 100 0

1000x1000 15 0 100 0

900x900 15 0 100 0

800x800 14 1 93.3 6.7

700x700 12 3 80 20

600x600 11 4 73.3 26.7

500x500 12 3 80 20

400x400 9 6 60 40

300x300 7 8 46.7 53.3

200x200 1 14 6.7 93.3

100x100 0 15 0 100

Total 111 54 67.3 32.7

diﬀerent modiﬁed gel images with 5 situations. We show the results of relocation in Table 4.3.

In the analysis system, we also used the small images to test the rota-

(66)

Table 4.2: The result of partial matching in diﬀerent image size with adaptive parameters.

Original Image 15 0 100 0

1000x1000 15 0 100 0

900x900 15 0 100 0

800x800 15 0 100 0

700x700 15 0 100 0

600x600 15 0 100 0

500x500 15 0 100 0

400x400 15 0 100 0

300x300 13 2 86.7 13.3

200x200 8 7 53.3 46.7

100x100 0 15 0 100

Total 141 24 85.5 14.5

tion, reverse situation, as shown in Fig 4.3 which one standard gel image matches to five different condition of 200x200 pixel size and it’s original gel images. The standard gel image is the small image which is left-up in six small images. Other small image with different conditions(rotated 90^◦, 180^◦, 270^◦, horizontally reverse, vertically reverse). The comparison process also deals with those conditions well. By this way, we can apply this method to database system in the future.

4.2 Database System

Comparing with most of the commercial software packages, database system is necessary in the analysis system. In order to well manage gel images and protein information, building a powerful database system together with the

(67)

Figure 4.3: Partial matching and relocation processing.

(68)

Table 4.3: The result of relocation in diﬀerent situation without adaptive parameters.

Rotated 90^◦ 12 3 80 20

Rotated 180^◦ 12 3 80 20

Rotated 270^◦ 12 3 80 20

Horizontal Reversal 13 2 86.7 13.3

Vertical Reversal 11 4 73.3 26.7

Total 60 15 80 20

analysis tool is very important. In this thesis, the database is constructed by MySQL 3.23.26 [33], which is a relational database system [34]. We record the results of spot detection and spot matching from the previous sections into the database. The results can be separated into two categories:

gel attributes which formulates the two dimensional electrophoresis gel, and image information such as spot information and matching relation. We record these information follows the format deﬁned by the SWISS-2D PAGE [7]. In the following ﬁgures , we will show the interface and results of our database system. In Fig 4.4, we present the gel attributes and stored value in our database. The meaning of these attributes is explained as following:

• gel name: Identity of 2DGE.

• gel location: The library produced the gel image.

• size x: The width of the gel image.

• size y: The height of the gel image.

(69)

Figure 4.4: Format of gel image’s attribute in database.

Figure 4.5: Information of gel images in 2DGE image analysis database system.

(70)

• sample source: The source of species and tissue.

• sample amount: The amount of sample loading.

• stain method: The method use for gel staining.

• 1d ph low: The lower limit of pH value in ﬁrst dimension.

• 1d ph high: The upper limit of pH value in ﬁrst dimension.

• 2d ma low: The lower limit of electric current in second dimension.

• 2d ma high: The upper limit of electric current in second dimension.

• mw low: The lower limit of molecular weight.

• mw high: The upper limit of molecular weight.

• date: The date of the analysis.

• notation: Notation.

After we record the gel attributes into database, we continue store the gel’s information. We also construct the attributes for the protein spots as the following format:

• spot nb: The number of the protein spot.

• coordinate x: The coordinate of the protein spot in x axis.

• coordinate y: The coordinate of the protein spot in y axis.

• spot volume: The volume of protein spot.

(71)

Chapter 5 Conclusion and Future Work

In the proposed system, we have provided an omnibus service to the analysis of gel images. We have combined registration, spot detection, spot matching and database system for gel image in the framework. In gel registration, we have used the piecewise bilinear mapping to revise the diﬀerence between two gel images. We then applied the result to detect and match the protein spots. In the spot detection process, we have provided an improved watershed algorithm of adaptive level of detail to make protein spots label and collect the spots information for further process. In the spot matching process, we focused on the issues of partial matching and relocation. We have developed an fast, accurate and content-based image matching method, i.e. maximum relation spanning tree (MRST). We can easily detect the protein spots in the partial images or in the modiﬁed gel images to match original gel images.

After all, we constitute the gel images and protein spots information into database for further investigation.

Although our system provides a good foundation to gel images analysis, there

(72)

are still several deﬁciency needed to be further fulﬁlled. First, the algorithm will fail in the spot matching by the small gel images if it has few protein spots. Since it doesn’t have enough spots to be the discrepancy of features.

Using image correlation method should solve this problem, but it could take more time. Secondly, it is better to get more gel images to train the system and see if the system can be more improved.

Beside, there are still several issues can be engaged in the future.

• Protein spots overlapped problem [35],

• Database reference [36], and

• Intelligent framework in the sense of automatic parameters adjustment, data mining and data classiﬁcation.

• Image processing tools.

Usually, we can ﬁnd the over-saturation condition in some big region. The regions may be combined by many diﬀerent protein spots, such as shown in Fig 5.1. We can’t segment these protein spots well using our segmentation algorithm. Therefor, we need to develop other methods to predict how many protein spots in these regions and segment them. Referring to other databases is very important topic after we have constructed the database system. The biologists can share their research results with each other and discover more information and knowledge by using single database. We can develop an intelligence framework to deal with those huge data. In Chapter

(73)

Figure 5.1: Overlap spot region. In protein spot No. 4, there may have several diﬀerent kind of protein.

2, we used two parameter to adapt our spot detection result, it is necessary to construct an intelligent way to adapt parameters automatically. Further- more, The intelligent framework also can provide us various tools such as data mining, data clustering, database classiﬁcation and so on. Finally, we will provide some image processing tools to handle image merge, spot volume statistics etc. Using this tools to analysis the 2DGE images more convenient and completed.

(74)

Bibliography

[1] W. F. Patton. ”Billogist’s perspective on analytical imaging systems as applied to protein gel electrophoresis”. Journal of Chromatography A, 698, 55-87, 1995.

[2] D. M. Bollag, M. D. Rozycki and S. J. Edelstein. ”Protein Methods, Second Edition”. Wiley-Liss, 1994.

[3] ”Z3”. http://www.compugen.co.il/.

[4] ”Carol”. http://www.inf.fu-berlin.de/.

[5] ”Melanie”. http://www.genebio.com/.

[6] ”ImageMaster 2D Elite Software”. http://www.imsupport.com/.

[7] ”SWISS-2DPAGE Two-dimensional polyacrylamide gel electrophoresis database”. http://tw.expasy.org/ch2d/.

[8] ”DbGel”. http://www.geocities.com/SiliconValley/Ridge/7603/.

[9] X. Y. Wang, D. D. Feng and H. Hong. ”Novel Elastic Registration for 2-D Medical and Gel Protein Images”. In Proc. First Asia-Pacific Bioin- formatics Conference (APBC2003), Adelaide, Australia. Conferences in Research and Practice in Information Technology, 19. Chen, Y.-P. P., Ed. ACS. 223-226. .

[10] S. Veeser, M. J. Dunn and G. Z. Yang. ”Multiresolution image reg- istration for two-dimensional gel electrophoresis”. Proteomics 2001, 1, 856-870.

(75)

[11] K. Takahashi, M. Nakazawa, Y. Watanabe and A. Konagaya. ”Fully- Automated Spot Recognition and Matching Algorithms for 2-D Gel Electrophoretogram of Genomic DNA”. In Proc. of Genome Informatics Workshop, pp.161-172, 1998. Dec.

[12] P. Culter, G. Heald, I. R. White and J. Ruan. ”A novel approach to spot detection for two-dimensional gel electrophoresis images using pixel value collection”. Proteomics 2003, 3, 392-401.

[13] L. Vincent and P. Soille. ”Watersheds in Digital Spaces: An Eﬃcient Algorithm Based on Immersion Simulations”. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 1991, Vol. 13, No. 6, pp. 583-598 .

[14] S.Beucher. ”The Watershed Transformation Applied to Image Segmen- tation”. Cambridge, UK, Scanning Microscopy International, suppl. 6.

1992, pp. 299-314.

[15] R. C. Gonzalez and R. E. Woods. ”Digital Image Processing”. Prentice Hall International Editions, 2002.

[16] K. Takahashi, M. Nakazawa, Y. Watanabe and A. Konagaya. ”Auto- mated Processing of 2-D Gel Electrophoretograms of Genomic DNA for Hunting Pathogenic DNA Melecular Changes”. In Proc. of Genome Informatics Workshop 1999, pp121-132.

[17] T. Matsuyama, T. Abe, C. H. Bae, Y. Takahashi, R. Kiuchi, T. Nakano, T. Asami and S. Yoshida. ”Adaptation of Restriction Landmark Ge- nomic Scanning (RGLS) to Plant Genome Analysis”. Plant Molecular Biology Reporter 18, 2000, 331-338.

[18] X. Ye, C.Y. Suen, M. Cheriet, E. Wang. ”A Recent Development in Image Analysis of Electrophoresis Gels” Vision Interface (VI’99), Trois- Rivieres, CA, 19-21 May 1999, pp. 432-438.

[19] G. Wolberg. ”Digital Image Warping”. IEEE Computer Society Press, 1994.

(76)

[20] H. J. Issaq, T. P. Conrads, G. M. Janini and T. D. Veenstra. ”Methods for fractionation, separation and proﬁling of proteins and peptides”.

Electrophoresis 2002, 23, 3048-3061.

[21] A. Efrat, F. Hoﬀmann, K. Kriegel and C. Schultz. ”Geometric Algo- rithms for the Analysis of 2D-Electrophoresis Gels”. Journal of Com- putational Biology, 2002,9(2): 299-315.

[22] F. Hoﬀmann, K. Kriegel and C. Wenk. ”Matching 2D Patterns of Pro- tein Spots”. Symposium on Computational Geometry 1998: 231-239.

[23] F. Hoﬀmann, K. Kriegel and C. Wenk. ”An applied point pattern match- ing problem: comparing 2D patterns of protein spots”. Discrete Applied Mathematics 93 (1999), 75-88.

[24] Y. Watanabe, K. Takahashi and M. Nakazawa. ”Automated Detection and Matching of Spots in Autoradiogram Images of Two-Dimensional Electrophoresis for High-speed Genome Scanning”. ICIP (3) 1997: 496- 499.

[25] S. H. Chang, F. H. Cheng, W. H. Hsu and G. Z. Wu. ”Fast Algorithm for Point Pattern Matching: Invariant to Translations, Rotations AND Scale Changes”. Pattern Recognition, Pergamon, 1997, Vol. 30, No. 2, pp. 311-320.

[26] D. R. Karger, P. N. Klein and R. E. Tarjan. ”A Randomized Linear- Time Algorithm to Find Minimum Spanning Trees”. Journal of the ACM, March 1995, 42(2):321-328.

[27] G. T. Toussaint. ”The Relative Neighbourhood Graph of a Finite Planar Set”. Published in Pattern Recognition, Vol. 12, 1980, pp. 261-268.

[28] J. W. Jaromczyk and G. T. Toussaint. ”Relative Neighborhood Graphs and Their Relatives”. Proc. IEEE (1992), 80 (9):1502X1517.

[29] P. Bose, L. Devroye, W. Evans and D. Kirkpatrick. ”On the Spanning Ratio of Gabriel Graphs and Beta-Skeletons”. Proc. Latin Am. Theo- retical Infocomatics (LATIN), 2002.

Image Analysis System for Protein Two Dimensional Gel Elelctrophoresis

Image Analysis System for Protein Two Dimensional Gel Elelctrophoresis

Juin-Lin Kuo

Department of Computer Science and Information Engineering Chung-Hua University

30 Tung Shiang ,Hsin Chu, Taiwan 30067 Email: [email protected]

July 30, 2003

Contents

List of Figures

List of Tables

Chapter 1

Introduction and Literature Survey

1.1 Background

1.2 Motivation

1.3 Literature Survey: Two Dimensional Gel Electrophoresis Image Analysis Software Package and Related References

1.4 Methodology

Chapter 2

Image Registration and Spots Detection

2.1 Multiresolution Image Registration

2.1.1 Methodology

2.1.2 Similarity Measure

2.1.3 Algorithm Flow Path

2.2 Spot Mapping Using the Result

2.3 Mathematical Morphology for Spots De- tection

2.3.1 Watershed Transformation

2.3.2 Improvement of Watershed Transformation

2.3.3 Over-segmentation Problem

2.4 Protein Spots Labelling

2.4.1 Region Growing

2.4.2 Ellipse Fitting and Exclusion

Chapter 3

Spot Matching

3.1 Direct Matching

3.2 Partial Matching and Relocation

3.2.1 Gabriel Graph and Relative Neighborhood Graph

3.2.2 Feature Extraction

3.3 Maximum Relation Spanning Tree

3.3.1 Data Structure

3.3.2 MRST Algorithm

3.4 Fuzzy Inference System

Chapter 4

Simulation Results and Database System

4.1 Results of Spot Matching

4.1.1 Results of Directed Matching

4.1.2 Results of Partial Matching and Relocation

4.2 Database System

Chapter 5

Conclusion and Future Work

Bibliography