PROPOSED GRANULAR COMPUTING MODELS - 粒化計算處理不平衡資料之理論與應用

In this chapter, we propose two kinds of GrC model, “Knowledge Acquisition via Information Granulation” (KAIG) model and IG based method, to tackle class imbalance problems. The KAIG model is suitable for dealing with discrete data and the IG based method is designed for continuous data. These two approaches can improve classification performance by controlling the reduction of unnecessary details.

In both of proposed models, Fuzzy ART (Adaptive resonance theory) neural network is utilized to construct IGs. The two indexes, the homogeneity index (H-index) and the undistinguishable ratio (U-ratio), are developed to determine a suitable level of granularity. In KAIG model, the concept of “sub-attributes” is presented to describe IGs and tackle the overlapping among granules. In IG based method, we propose three strategies which utilize different data characteristics and their combinations to represent IGs.

3.1 Construction of Information Granules

In this study, the Fuzzy ART is utilized to construct IGs. ART is a well established neural network theory developed by Carpenter et al. (1991). The ART network is also a famous method of clustering. Instead of clustering by a given number of clusters, it assigns patterns onto the same cluster by comparing their similarity. The detailed algorithm of Fuzzy ART can be found in (Serrano-Gotarredona et al., 1998).

The major difference between ART and other unsupervised neural networks is the so called vigilance parameter (ρ) which is viewed as a granularity and can be adjusted by the users to control the degree of similarity of patterns placed on the same cluster. In an ART, the degree of similarity between a new pattern and a stored pattern is defined. This similarity, compared to ρ, is a measure to ensure whether the new pattern is properly classified or not. The other unsupervised learning neural networks which do not implement vigilance may cause a significantly different input pattern to be forced into an inappropriate cluster. In contrast to some other cluster methods, an ART network will not automatically force all input vectors onto a cluster if they are not sufficiently similar. This is the reason why the ART network is employed in this study to construct the IGs.

There are three similar ART architectures, namely ART 1, ART 2, and Fuzzy ART. ART 1 is designed for binary-valued input patterns, and ART 2 is for continuous-valued patterns. Fuzzy ART is the most recent adaptive resonance framework that provides a unified architecture for both binary and continuous valued inputs. There are several factors that motivated us to use Fuzzy ART, and they are as follows (Burke and Kamal, 1995):

(1)Unlike ART1, Fuzzy ART does not require a completely binary representation of the parts to be grouped. In addition, Fuzzy ART possesses the same desirable stability properties as ART1 and a simpler architecture than that of ART2.

(2)ART2 can experience difficulty in achieving good categorizations if the input patterns are not all normalized to a constant length. However, such normalization can possibly destroy valuable information. Besides, there is a serious dependency of classification results in the case of ART1 on the sequence of input presentation.

As a result, the Fuzzy ART network is employed to construct IGs in this study.

3.2 Selection of Granularity

Selecting an appropriate size of IGs is a difficult task. Enough background knowledge is required to determine how similar objects should be gathered together to form one IG. An objective index is needed to select the appropriate similarity of granules. We propose H-index and U-ratio to solve this problem.

The basic assumption of the H-index is that the classes of objects should be equal if their values of attributes are sufficiently similar. This implies that we always make the same decision under a similar condition. Because we form granules by the similarity of objects, the objects in the same granule should have the same class. The H-index is used to measure the consistency of the class of the objects in one IG. The H-index is defined as

where n represents the number of all objects in one granule, m is the number of all IGs and i is the amount of objects possessing the majority class.

For example, Table 3.1 shows one IG involving five objects (n=5). There are 4 condition attributes (namely A, B, C and D) in the iris data. The decision attribute (class) of the first 4 objects is “versicolor”, but the last one has a different decision attribute, “setosa”. In this example, “versicolor” is the majority class and i=4. The H-index of this IG is Table 3.1 The information granule- iris example

Another index for selecting similarity is the U-ratio. In the preceding example,

“versicolor” is the majority of the classes. Therefore, it is assigned to be the class of this IG. If there is another granule described as Table 3.2, and we are unable to distinguish the class of the IG, then we call that granule an “undistinguishable granule.” The U-ratio is defined as

m ratio u

U − = (3.2)

where u represents the number of undistinguishable granules and m represents the quantity of all granules.

This index is to calculate the proportion of undistinguishable granules to all granules. If there are ten granules and two of them are undistinguishable granules, which means u is equal to 2 and m is equal to 10, then the U-ratio is equal to 0.2.

By using these two indexes, we also need a “granularity selection criteria” to determine the similarity of the IGs. In the present study, the larger the H-index the better it is, because it means that more objects in one IG possess the same class. There is no need to set up the index to a fixed value. The size of the index depends on the domain knowledge or how large an error you can tolerate. On the other hand, the U-ratio is the opposite. As far as the U-ratio is concerned, the smaller the better. It’s difficult to process an undistinguishable granule, so we need to view them carefully.

However, we try to avoid this situation by setting the U-ratio as small as possible. In other words, if we select a specific similarity where the H-index is larger and the

Condition attributes

A B C D Decision attribute

5.4 2.2 3.9 1.2 versicolor

6.8 3.4 5.6 2.4 virginica

Table 3.2 The undistinguishable information granule

U-ratio is smaller, then this similarity is the best solution.

3.3 Representation of Information Granules 3.3.1 The Concept of “Sub-attributes”

In KAIG model, we propose the concept of “sub-attributes” to represent IGs.

First, we utilize hyperboxes to represent IGs (Pedrycz and Bargiela, 2002). For example, a hyperbox [b defined in ] R is fully described by its lower ⁿ (b⁻) and upper corner (b⁺), where b and ⁻ b are vectors in ⁺ R . An important and ⁿ frequently used universal set is the set of all points in the n-dimensional space. This set is denoted as R . Using ⁿ b and ⁻ b we can express the hyperbox as ⁺

] , [ ]

[b = b⁻ b⁺ . Consider two IGs (hyperboxes) A=[a] and B=[b] defined in R . ² More explicitly, we follow a full notation [a]=[a⁻,a⁺] and [b]=[b⁻,b⁺]. These two granules are described as Table 3.3.

Table 3.3 Two IGs represented by hyperbox form

Attributes IGs

X1 X₂

A {a₁⁻, a₁⁺} {a₂⁻, a₂⁺} B {b₁⁻, b₁⁺} {b₂⁻, b₂⁺}

As Figure 3.1 shows, there are overlaps between two granules A and B. This makes it difficult to handle by knowledge acquisition tools. This is because most of knowledge acquisition algorithms are not designed to deal with IGs, especially when overlapping occurs between granules. Unfortunately, the overlapping situation always happens in real world. In this study, we introduce the concept of “sub-attributes” to tackle the problem of overlaps between granules.

We can explain this idea of “sub-attributes” by using Figure 3.1. In axis X1

(attribute 1), the overlapping part of two granules are separated into overlapping part ([b₁⁻,a₁⁺]) and non-overlapping parts ([a₁⁻,b₁⁻] & [a₁⁺,b₁⁺]). These sub-intervals,

] ,

[a₁⁻ b₁⁻ , ][b₁⁻,a₁⁺ & [a₁⁺,b₁⁺], are named as X11, X12, X13 which are so called

“sub-attributes.” The binary variable which is employed to be the values of sub-attributes represents whether an IG contains these sub-intervals or not. The results of rewriting the IGs by using sub-attributes can be found in Table 3.4. We divide the original attribute X₁ into sub-attributesX₁₁, X₁₂, X₁₃; and attribute X₂ into

X21, X₂₂, X₂₃. Then, these two granules are rewritten by replacing the original attributes with sub-attributes. By introducing the concept of sub-attributes, we can easily extract knowledge from the granules even if the overlapping situation always exists.

Figure 3.1 The overlap between IGs

Table 3.4 The IGs with sub-attributes

Original attributes _X₁ _X₂

X11 X₁₂ X ₁₃ X₂₁ X₂₂ X₂₃

The concept of “sub-attributes” can maintain the complete characteristics of data.

The IGs with addition of sub-attributes are suitable for all knowledge acquisition algorithms. It is not required to adjust the computational architecture of these algorithms. However, too many sub-attributes may be generated in the situation of natural overlapping which the values of the condition attributes are continuous and diverse. Therefore, as we often do in data preparation phase of data mining, we suggest descretizing data before implementing KAIG model to control the number of sub-attributes.

3.3.2 Using Data Characteristics to represent IGs

As mentioned above, too many sub-attributes will increase computational complexity. In order to avoid this situation, we propose another idea which uses data characteristics to describe IGs. Unlike “sub-attributes” which use intervals to represent IG, we utilize different data points such as mean, median, maximum, minimum, and quartiles to describe IGs in IG based method. Three IG representation strategies are provided. In strategy 1, we utilize single value, mean and median, to describe IGs. The strategy 2 uses double-value combinations of data characteristics, Q1+Q3 and Maximum+Minimum. In strategy 3, we employ triple-values combinations, Q1+Median+Q3 and Maximum+Mean+Minimum.

3.4 Proposed Methodologies

This section summarizes the procedure of two proposed GrC models. First, we address how the IGs are formed from numerical data. Secondly, H-index and U-ratio are introduced to determine the level of granularity which can be used to construct IGs in Fuzzy ART. Then, we try to describe IGs and extract knowledge from them.

3.4.1 The KAIG Model

Figure 3.2 shows the proposed KAIG model. We summarized KAIG model by the following steps:

Step 1: Information Granulation

In step 1, we use Fuzzy ART to construct IGs. But, first thing we need to determine is to select the suitable level of granularity (vigilance). The IGs are formed by the selected granularity. The initial value of granularity is set 1 and then decrease gradually until find one satisfying criteria of H-index and U-ratio. The found suitable granularity is employed to construct IGs.

Step 2: Information Granules Representation

IGs are represented in a suitable form that can be handled by knowledge acquisition tools. As mentioned in section 3.2.3, these formed IGs are described in hyperboxes. Then, the sub-attributes are applied in these IGs to solve the problem and finally we can extract knowledge from these IGs.

Step 3: Knowledge Acquisition

After describing IGs appropriately and tackling the overlapping situation, we can Knowledge rules

Figure 3.2 Knowledge Acquisition via Information Granulation (KAIG) model Numerical data

Select the level of granularity

Information granules representation (Sub-attributes) Knowledge acquisition

Check granularity by using H-index

& U-ratio

Not satisfied

Satisfied Information granulation

use knowledge acquisition tools to extract knowledge rules from the granules. In this study, we will compare three famous data mining algorithms, C4.5, Rough sets and back-propagation neural network, to evaluate their effectiveness in KAIG model.

3.4.2 The IG based Method

In KAIG model, we use “sub-attributes” to describe IGs and solve the overlapping situation effectively. However, when dealing with continuous data, KAIG may generate so many sub-attributes that increase the computational complexity of knowledge acquisition algorithms. The same situation may occur while the discretization algorithms dividing the continuous attribute’s value into too many discrete intervals. Therefore, we propose the IG based method in this section.

In this method, the “information granulation” process is the same with KAIG model. Only one difference is the description of IGs. This method utilizes data characteristics to denote IGs without using sub-attributes. This IG based method follows the three steps described as bellow. We adopt three strategies which are listed in Step 2 to describe IGs. They are different combinations of data characteristics (mean, median, quartiles, maximum & minimum), single-value, double-value, and triple-value strategies. Then we can build a classifier from these data characteristics.

Step 1: Information Granulation

Step 2: IG Representation: Data Characteristics Strategy 1- Single value: Mean, Median.

Strategy 2-Double values: Max+Min, Q1+Q3

Strategy 3-Triple values: Max+Mean+Min, Q1+Median+Q3 Step 3: Knowledge Acquisition

CHAPTER 4

在文檔中粒化計算處理不平衡資料之理論與應用 (頁 32-41)