• 沒有找到結果。

應用Bayesian架構發展分類學習模型

N/A
N/A
Protected

Academic year: 2021

Share "應用Bayesian架構發展分類學習模型"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

應用 Bayesian 架構發展分類學習模型(第 2 年)

研究成果報告(完整版)

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 96-2413-H-004-020-MY2

執 行 期 間 : 97 年 08 月 01 日至 98 年 09 月 30 日

執 行 單 位 : 國立政治大學心智、大腦與學習研究中心

計 畫 主 持 人 : 楊立行

共 同 主 持 人 : 許清芳

計畫參與人員: 碩士級-專任助理人員:鄭惟尹

碩士級-專任助理人員:蔡涵如

報 告 附 件 : 出席國際會議研究心得報告及發表論文

處 理 方 式 : 本計畫涉及專利或其他智慧財產權,2 年後可公開查詢

中 華 民 國 99 年 01 月 18 日

(2)

Running head: LOCAL COMPARISON AMONG EXEMPLARS

NSC-Report-96-2413-H-004-020-MY2

Categorization by local relationship among exemplars

Lee-Xieng Yang

Research Center of Mind, Brain, and Learning National Chengchi University

Lee-Xieng Yang

Research Center of Mind, Brain, and Learning National Chengchi University

NO.64, Sec.2, ZhiNan Rd., Wenshan District, Taipei City 11605, Taiwan

lxyang@nccu.edu.tw

(3)

Abstract

The principal purpose of this study is to develop a categorization model with Bayesian theorems. The developed model SOE holds the assumptions that the percept of each input is represented as a normal distribution over the units on the stimulus dimension and that the classification is made by referencing the weighted strengths over the units within the scope of attention focus. SOE in this study is evident to be sensitive to the repetition of stimuli, stimulus order, and, the most important, the variability effect in categorization. This effect is thought to be a severe challenge to the exemplar-based models. However, the success of SOE suggests a possible exemplar-based account for this effect.

(4)

NSC-Report-96-2413-H-004-020-MY2

Categorization by local relationship among exemplars

Similarity in Categorization

Categorization is referred to a process to group similar objects together and separate apart different objects. Thus, similarity is thought to be the basis for

categorization. That is, the more similar an object is to the exemplars of one category, the more likely this object will be the member of that category. Although similarity can be defined in many respects, in the computational models for categorization, similarity is commonly determined by the matching degree on the values of features between objects. Suppose, an object i with 4 dichotomous features can be denoted as [1 0 1 0] and anther object j as [1 0 0 1]. The exact match between objects on a feature is normally defined as 1 for similarity and the mismatch is a value 0 ≤ s < 1. Thus, these the similarity between these two objects can be defined as the product of feature similarities

Simij = 1 × 1 × s × s (see Medin & Schaffer, 1978).

This computation of similarity is extended by geometrically mapping the objects as dots into a psychological space, whose dimensions correspond to the features of objects. (Nosofsky, 1986) proposed his influential model, GCM (Generalized Context Model), with MDS method to construct subjects’ psychological space. According to GCM, the

similarity is the negatively exponential transformation of the distance in the psychological space, Simij = e−cdij, and the psychological distance is computed by the distance between objects i and j on each dimension m as dij = Σαm(|Oim− Ojm|p)

1

q, where p = q = 1 for

city-block distance and p = q = 2 for Euclidean distance and αm for the selective attention on dimension m, Σαm= 1. This function is called Minkowski function in Mathematics. With this simple algorithm for similarity, (Nosofsky, 1986) showed that the identification

(5)

task and classification task actually share the identical representations, namely the exemplars. With different stimuli, (Nosofsky, 1987) showed the city-block metric and the Euclidean metric are respectively suitable for the stimuli consisting of psychological separable and psychological integral dimensions. To date, GEM is influential still and provide good accounts for many categorization phenomena.

Even when neural network models become prevalent in categorization, the

Minkowski function used in GCM is still applied in those neural network models, such as the activation of hidden nodes or clusters in ALCOVE (Kruschke, 1992) and SUSTAIN (Love, Medin, & Gureckis, 2004).

Challenge to Exemplar-Based Model

The models with intra-exemplar similarity as the basis for categorization are called exemplar-based model. Apparently, not every researcher would agree with the assumption of the exemplar-based models. Instead of similarity, categorization is done by applying a rule separating the objects into groups. With the geometrical representation, this rule can be represented as a boundary in a psychological space, assumed in GRT, General

Recognition Theory, (see Ashby & Gott, 1988). For example, the simplest boundary may have the form of ”Respond A, if Φ(X) > Φ(Xc), otherwise B”, where Xc is the criterion value on dimension X and Φ(X)istheperceptevokedbyvalueX. In addition to the

one-dimensional boundary, two-dimensional linear boundary and even quadratic boundary are counted in as a rule family(see for more discussion Ashby, 1992).

There are a great deal of evidence supporting GRT, contrasting GCM,(Ashby & Gott, 1988; Ashby & Maddox, 1992; Maddox & Ashby, 1993) with the major arguments that GRT outperforms GCM on predicting individual subject’s data and people tend to learn the optimal rule even though a lower order rule can provide plausible predictions. However, the researchers endorsing the exemplar-based models provide considerable

(6)

amount of evidence against GRT (Nosofsky, 1998) and further more Nosofsky and

Johansen (2000) provided computer simulation results for those phenomena thought to be difficult for the exemplar-based model to account for.

Perhaps, the most challenging phenomenon to the exemplar-based model is the category variability effect. Rips (1989) reported an inconsistency between similarity and categorization in a study, in that one group of subjects were asked to classify a 3-in in diameter circular object to the category of COIN or PIZZA whereas another group of subjects were asked to respond to which category (COIN or PIZZA) a 3-in circular object is more similar. The result showed that the subjects tended to regard the target object more similar to the category of COIN but classified it to the category of PIZZA1. The categorization decision and similarity choice should be consistent in direction, have the assumption of the exemplar-based model been held. This phenomenon occurs only when the variability of contrasting categories is largely different. In the Rips’ study, the COIN category is relatively condensed than the PIZZA category. Even the endorsers for the exemplar-based model admits this phenomenon is hard to be accounted for by the exemplar similarity only(Nosofsky & Johansen, 2000).

However, the Rips’s finding is not always replicated in the following studies. Cohen, Nosofsky, and Zaki (2001) asked the subjects to learn 8 lines of interval length, in which the shortest one alone formed one category (small variability) and the rest another

category (large variability). The test lines were of many different lengths including the one in the middle between the two closest lines from two categories. Just like the Rips’s result, the subjects classified the middle line to the large-variability category 47% of times. Comparing to the other condition, in which the large-variability category only had 3 lines, namely less variable, only 29% of times the middle line was classified to the

large-variability category. Therefore, the classification for the middle line was a function of the variability difference between categories. This is not predicted by the exemplar-based

(7)

models.

However, Stewart, Brown, and Chater (2002) did not get this result with a similar manipulation, except when the subjects were allowed to observe all items together. This result suggests that the variability effect reveals only when people have noticed the difference on the category variability. As suggested by Nosofsky and Johansen (2000), the exemplar-based model has a chance to accommodate the variability effect if the

computation of similarity for different categories with different specificity, c. However, Nosofsky and Johansen (2000) also mentioned that there was no theoretical reason to do so. Both the studies of Cohen et al. (2001) and Stewart et al. (2002) suggested that the variability effect can be accounted for by the model taking into consideration the variance of category. In GRT, the boundary between two categories is located on where the two category distributions have the same probability density. Thus, the boundary would be closer to the small-variability category at all times. However, GRT can provide a better fit to the categorization data, when the sample size of training items is large(see Rouder & Ratcliff, 2004, 2006). Thus, it can be expected that the learned boundary would be more likely not the optimal one predicted by GRT for fewer items.

Variability Effect and Exemplar Similarity

According to the previous discussion, the variability effect is a challenge to both the exemplar- and rule-based models. It is worth analyzing the reason why the exemplar-based models cannot predict the variability effect before generating a model for this effect. In GCM, the similarity is solely determined by the psychological distance between items. Since the similarity of the middle item to small-variability category is proportional to the total distance from the members of the small-variability category to the middle item. Thus, the large-variability category has more farther members to the middle item, hence less similar, Simsmall> Simlarge and P (small|middleitem) > P (large|middleitem).

(8)

In ALCOVE, the probability for an input to be classified to one category is determined by the activation on the output nodes2. During training, the associative weights from the hidden nodes3 to the output nodes are learned by error signal,

∆Wjk = η(Tk− Ok)Ahidj , where η is the learning rate, (Tk− Ok) is the error signal on the kth output node, and Ahidj is the activation of the jth hidden node. Since the output nodes have opposite target values Tk to each other, on each time of training, the weight from one hidden node to the output nodes is changed in opposite directions. Also, since Ok = ΣWjkAhidj , the associative weight can be realized as the likelihood a hidden node to activate an output node. Therefore, with the error-driven learning, one hidden node, namely exemplar, that tends to activate one output node must tend to deactivate the other output nodes. The activation of hidden nodes are the similarity between the input and the exemplars. Thus, the small-variability category exemplars would induce stronger activation on the corresponding category node and stronger deactivation on the

contrasting category node than do the large-variability category members. Presumably, ALCOVE will predict a larger probability to the small-variability category for the middle item.

The reason for the failure of the exemplar-based models is thought to be the lack of sensitivity to the category variance. Apparently, as long as the total distance to the target item is kept constant, the prediction of the exemplar-based models would not be changed, even if we rearrange the positions of the exemplars within the category to make the category become no matter loose or condensed.

A New Exemplar-Based Model

Therefore, I propose a new exemplar-based model, SOE (Strength profile Of Exemplars), to compensate for the insensitivity of the exemplar-based models. The basic characteristics are listed as follows. First, same as the assumption of GRT, the percept of

(9)

each input stimulus is a normal distribution over the activation strength of the units on psychological dimensions, with the mean as the position of the input stimulus, xtarget and the variance as a free parameter, σ. Second, the height of this strength distribution by definition means the likelihood for the corresponding unit to predict the current category prior to next trial. See Equation 1.

p(A|uniti) = 1 √ 2πσ2exp −(xi−xtarget) 2 2σ2 4. (1)

Third, from the second stimulus, the strength of each unit for each input is normally transformed5 to the weighting for the unit. Thereafter, the posterior probability for the propensity to assign tth input stimulus to Category A is computed by

P ro(A|Inputt) = Σi,tp(A|uniti,t−1)wi,t∀ ≤ i ∈ (xµ− σatt, xµ+ σatt), (2)

where (xµ− σatt, xµ+ σatt) is the area under the focus of attention, which means only the units in this area will be included for computing the propensity. Forth, the probability of Category A is computed by

P (A|Inputt) =

expφP ro(A|Inputt)

expφP ro(A|Inputt)+ expφP ro(B|Inputt). (3)

Fifth, whenever the model receives an input stimulus, the strength distribution for the corresponding category would be accumulated with the preceding one. For example, if the current input belongs to Category A, after presentation of the category label, the strength distribution over the units for Category A prior to the t + 1th trial is

p(A|uniti,t) = ηp(A|uniti,t) + p(A|uniti,t−1). (4)

Summary for SOE

A summary for SOE is made here. SOE is a dynamic computational model, in which before the t + 1th trial, the strength distribution for each category cumulated up to

(10)

the tth trial is the propensity of the units to predict category prior to the presentation of the stimulus on the t + 1th trial. The propensity distribution is regarded as the category representation in LTM (Long-Term Memory) in SOE. When making classification, the strength distribution temporarily evoked by the input stimulus is normalized to be weighting for the units’ propensity to each category. Also, only the units in the attention focus would be used for predicting categories via forming the sum of weighted propensity for each category.

The propensities for categories, then, are exponentially transformed to the

probability of each category. After the feedback is provided, the propensity distribution for each category is updated in LTM by summing the weighted current strength

distribution to the propensity distribution for the same category. Therefore, there are 4 parameters in SOE: the variance of the strength distribution σ, the radius of the attention focus σatt, the updating rate η, and the decision constant for transforming the propensity information to probability of category φ; the larger the φ, the more deterministic the categorization becomes. The number of units on a single dimension is 276.

Model Performance

Same Stimulus and Same Category

The SOE model is tested from a number of aspects. First, the repetition of the same stimulus will change the accumulated strength distribution towards a higher propensity to classify the same item as its category. As shown in Figure 1, the stimulus has a value of 5 and is presented repeatedly for 3 times. Obviously, the propensity for Category A on the position of 5 gets higher as the repeated times increase and the probability of Category A is from .50, .70, to .84, with σ = 1, σatt = 0.70, η = 0.50, and φ = 1.

(11)

Same Stimulus and Changed Category

With the parameter values fixed, the predicted probability of Category A to a same stimulus would be changed in accordance with the change of its category label. Figure ?? shows the change on the propensity distributions for both categories when the stimulus is paired with Category A on the first two trials and Category B on the last two. On trail 1, each of the units has a 0 propensity to either category. Thus, the panels start from trial 2. In the end of trail 1, namely before trail 2, the propensity distribution for Category A is obviously larger than that for Category B which gets even larger when the same stimulus is presented the second time on trial 2. However, the propensity distribution for Category B gets larger from the end of trial 3 and in the end of trial 4, both distributions have the same profile. The probability of Category A changes from .50, .70, .84, to .70. Apparently, SOE reveals honestly the history of the pair between the stimulus and the category labels.

Order Effect

SOE is sensitive to the trial order. For instance, the stimuli of 3, 4, 5, and 6 in the same category would be differentially classified are presented in different orders. Applying the previous parameter values, SOE is received this set of stimuli in the order of 3, 4, 5, and 6 or the order of 5, 4, 6, and 3). Figure 3 shows the prior propensity distribution for Category A in these two conditions. Although the presenting order is different, the cumulated propensity distribution in the end of training is the same, denoted as the sold line. However, the probability of Category A to stimuli of 3, 4, 5, and 6 are different, as .50, .63, .68, and .65 in the sequential-order condition but as .69, .63, .50, and .65. Apparently, the same stimulus in different presentation position would be differently classified. This is because SOE is a dynamic categorization model sensitive to the presentation order of stimuli.

(12)

Variability Effect and Attention Focus

The critical comparison between SOE and other models (e.g., GCM and GRT) is to see if SOE can predict the variability effect. In order to replicate the findings of Cohen et al. (2001), a set of stimulus values (2, 6, 6.5, 7, 7.5, and 8) are used for a computer experiment with the value 2 alone in Category A and the rest in Category B. Therefore, Category A is less variable than Category B is. Since the number of Category B stimuli is 5 times many as that of Category A, the Category A stimulus repeats 5 times many as does each of those stimuli of Category B. Similarly, the middle item is 4 which is equidistant to 2 and 6. However, different from the study of Cohen et al. (2001), the condition employing only some of Category B stimuli is not included here.

Figure 4 shows the computer experiment’s result that the probability of Category B decreases as the size of Attention Focus increases, with the size of attention focus σatt varying from 0.01, 0.10, ... to 6.00 and other parameter values fixed as η = 0.50, σ = 1.00, φ = 1.00. When the classification is made, based on the information from a very small area on the stimulus dimension, the probability of Category B on the middle item is approaching an upper limit, about .48. Theoretically, the limit value of this probability should be .50 as the size of attention focus is infinitely small. However, when the size of attention focus is large enough to cover the whole range of the stimulus values, the middle item has only 22% of chance to be classified as Category B, namely the large-variability category.

According to GCM, the middle item should be more likely classified into Category A as predicted by SOE when the size of attention focus is large. This suggests that GCM may be able to show a sensitivity to the variability of category, if not every exemplar is included in similarity computation. In the original design of the study of Cohen et al. (2001), each of the closest two stimuli to the middle item was presented on 45% of the learning trials. However, the simulation result in Figure 4 is generated in the design that

(13)

with the middle item as the center, the nearest Category A stimulus is presented 5 times many as the nearest Category B stimulus. Thus, another computer experiment is

conducted with the nearest two stimuli presented in nearly equal times7. The stimulus values include 3, 6, 10, and 14 with 3 alone in Category A and the rest in Category B. Same as in the previous computer experiment, the size of attention focus is the only independent variable. With the parameter values as η = 0.50, σ = 5.00, and φ = 1.00, the middle item is classified as Category A most often (Pr(A)=.70 and Pr(B)=.30) when the size is 3.00. When the size is about 0.30 or 5.50, the middle item is less often classified into Category A (Pr(A)=.52 and Pr(B)=.48).

This result combined with the previous results suggest that the size of attention focus influences categorization a lot and that this influence might be linear or nonlinear to the model’s prediction, depending on the category structure.

Conclusions

In this study, a new categorization model SOE is proposed, with the assumptions that the percept of each input stimulus is in the form of normal distribution and that categorization is accomplished by comparing the weighted propensity distribution to each category over the units in the attention focus. The computer experiments show that SOE is sensitive to the repetition of the same stimulus, the presenting order of stimuli, and the variability of category. Of most important, the size of attention focus has a great influence on categorization. Specifically, the flexibility of SOE on fit to the variability effect results from the size of attention focus also.

Although SOE assumes that the evoked strength over units follows the Gaussian distribution, it is theoretically plausible to replace the Gaussian distribution by Minkowski function, same as used in GCM and other exemplar-based models. At this moment, SOE is suitable only for the categorization performance with one-dimensional category

(14)

structure. The future study will pursue this concern to make SOE suitable for multi-dimensional category structure.

(15)

References

Ashby, F. G. (1992). Multidimensional models of perception and cognition. In F. G. Ashby (Ed.), (p. 449-483). Hillsdale, NJ: Erlbaum.

Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychological: Learning, Memory and Cognition, 14 , 33-53.

Ashby, F. G., & Maddox, W. T. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Journal of Experimental Psychological: Human perception and performance, 18 , 50-71.

Cohen, A. L., Nosofsky, R. M., & Zaki, S. R. (2001). Category variability, exemplar similarity, and perceptual classification. Memory and Cognition, 29 , 1165-1175. Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist model of category

learning. Psychological Review , 99 , 22-44.

Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sustain: A network model of categorization. Psychological Review , 111 , 309-332.

Maddox, W. T., & Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception & Psychophysics, 53 , 49-70.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review , 85 , 207-238.

Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychological: General , 115 , 39-57. Nosofsky, R. M. (1987). Attention and learning processes in the identification and

classification of integral stimuli. Journal of Experimental Psychological: Learning, Memory and Cognition, 13 , 87-108.

Nosofsky, R. M. (1998). Selective attention and the formation of linear decision boundaries: Reply to maddox and ashby (1998). Journal of Experimental

(16)

Psychological: Human Perception & Psychophysics, 24 , 322-339.

Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of multiple-system phenomena in perceptual categorization. Psychonomic Bulletin & Review , 7 , 375-402.

Rips, L. J. (1989). Similarity and analogical reasoning. In S. Vosniadou & A. Ortony (Eds.), (p. 21-59). New York: Cambridge University Press.

Rouder, J. N., & Ratcliff, R. (2004). Comparing models of categorization. Journal of Experimental Psychological: General , 133 , 63-82.

Rouder, J. N., & Ratcliff, R. (2006). Comparing exemplar- and rule-based theories of categorization. Current Directions in Psychological Science, 15 , 9-13.

Stewart, N., Brown, G. D. A., & Chater, N. (2002). Sequence effects in categorization of simple perceptual stimuli. Journal of Experimental Psychological: Learning, Memory and Cognition, 28 , 3-11.

(17)

Footnotes

1Three inches is about equidistant in diameter to each prototype of the COIN and

PIZZA categories.

2Each output node corresponds to a category in ALCOVE. 3In ALCOVE, the hidden nodes correspond to the exemplars.

4Before any input, the strength on each unit for each category is set as 0. 5All likelihoods are divided by the largest one.

6There is no convention for the number of units, yet the more the units, the more

smooth the propensity/strength distribution would be.

7Since SOE is very sensitive to the number of stimuli, the stimulus of Category B

(i.e., the large-variability category) nearest to the middle item is presented in less times than the nearest stimulus of Category A, in order to make both category equal in amount.

(18)

Figure Captions

Figure 1. The propensity distribution for Category A along the repetition of an identical stimulus 3 times.

Figure 2. The propensity distributions for both categories change when the same stimulus mapped to a contrasting category.

Figure 3. The propensity distributions for both categories change with different presenting orders of stimuli.

Figure 4. The middle item is classified to the large-variability category less probably as the attention focus size increases.

Figure 5. The probability of Category A on the middle item changes with the size of attention focus when the two nearest exemplars are presented in nearly equal times.

(19)

1 2 3 4 5 6 7 8 9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Likelihood for A Single Stimulus Trial 1 Trial 2 Trial 3

(20)

0 5 10 0 0.1 0.2 0.3 0.4 Before Trial 2 0 5 10 0 0.1 0.2 0.3 0.4 Before Trial 3 0 5 10 0 0.1 0.2 0.3 0.4 Before Trial 4 0 5 10 0 0.1 0.2 0.3 0.4 Before Trial 5 A B

(21)

2 4 6 8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Sequential Order 2 4 6 8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Random Order Trial 2 Trial 3 Trial 4 Trial 5

(22)

0.01 0.1 0.2 0.4 0.6 0.8 1.0 2.0 3.0 4.0 5.0 6.0 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Size of Attention Focus

Pr(B)

(23)

0.01 0.1 0.2 0.4 0.6 0.8 1.0 2.0 3.0 4.0 5.0 6.0 0.45 0.5 0.55 0.6 0.65 0.7 0.75

Size of Attention Focus

Pr(A)

(24)

出席 Society of Computer in Psychology (2008) 年會心得

時間:97.11.11-97.11.18

國際地點:芝加哥

本次出席 SCiP 主要是發表本人於國科會計劃(96-2413-H-004-020-MY2)第一年

所開發之類別學習模型。發表之題目為:POLARIUM: A new categorization model

for individual differences in categorization,該會主要的宗旨在於應用電腦技術於心

理學研究相關技術與理論面之研發。本人的類別學習模型 POLARIUM,乃一類神

經網路模型,可學習在不同情境下使用不同類別規則,進行分類,透過此種模式,

受試者分類策略之間的歧異性可以得到充分解釋。其中,尤其在作為選擇分類規

則的機制上,應用 Bayesian 定理做為表現決策行為的演算法則。

會後並與與會人士,如:Simon Farrell,對於此種架構之分類模型,進行更進一

步的討論。而同樣也是開發 Bayesian 模型的 Michel Kalish 亦針對本模型之可能缺

失提出針砭,對本人修改模型實有極大助益。同時,也於會中請教對 Bayesian

推論甚有研究的許清芳教授,並蒙賜教,以進一步提升本模型之實際適用價值。

參考文獻

相關文件

The first row shows the eyespot with white inner ring, black middle ring, and yellow outer ring in Bicyclus anynana.. The second row provides the eyespot with black inner ring

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

好了既然 Z[x] 中的 ideal 不一定是 principle ideal 那麼我們就不能學 Proposition 7.2.11 的方法得到 Z[x] 中的 irreducible element 就是 prime element 了..

Students are asked to collect information (including materials from books, pamphlet from Environmental Protection Department...etc.) of the possible effects of pollution on our

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

For pedagogical purposes, let us start consideration from a simple one-dimensional (1D) system, where electrons are confined to a chain parallel to the x axis. As it is well known

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 