• 沒有找到結果。

526 UU1180 Lecture Notes Chapter 3 Supplementary Material Setting the Weights in Multilayer Perceptron

N/A
N/A
Protected

Academic year: 2022

Share "526 UU1180 Lecture Notes Chapter 3 Supplementary Material Setting the Weights in Multilayer Perceptron"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

526 UU1180 Lecture Notes Chapter 3 Supplementary Material

Setting the Weights in Multilayer Perceptron

Lectruer Cheng-Yuan Liou

Department of Computer Science and Information Engineering, National Taiwan University, Republic of China

Abstract

Instead of training the weights, this note shows that the weights in Chapter 3 can be preset and used in the multilayer perceptron. It has a perfect performance for any training datasets. Its performance is compa- rable to that of SVM for the testing datasets. I illustrate an example in this material.

1 Designing the weights

Let the set of all patterns be X = {xp, p = 1, . . . , P }. Each pattern xp is a D-dimensional column vector. The label function, C : RD → N, maps each pattern, xp, to its class identity number, c . Suppose there are 3 hidden layers

(2)

in the network, {m = 1, 2, . . . , L}, refer Figure 1(c). Let nmbe the total number of neurons in the mth layer.

The method in Chapter 3 provids a design for the initial weights for the MLP. It is a civide-and-conquer design. I will show that a general-position two- class classification problem can be solved perfectly with three hidden layers.

This design is very different from all BP algorithms that solve the complexity, Σnk=0m−1¡nm

k

¢ [6], in the succeeding MLP layers. It is also a divide-and-conquer

design. Figure 1(a) illustrates the network for a two-class problem, c1= 1 and c2= 2, in a two dimensional space, D = n0= 2.

In this D space, a center line of a strip, xpxq, is allocated for two near patterns, xp and xq, that are in a same class c1, xp ∈ c1 and xq ∈ c1. We assume that c1 contains fewer number of patterns than that of c2. Then, this center line is split into two parallel lines, line a and line b. They are in the two opposite sides of the center line and parallel to the center line, akxpxqkb.

For a, pick a pattern xr, xr ∈ c2, where xr is closest to xpxq. xr and a are in the same side of xpxq. (Note that this xr may not exist. The line xpxq is suffice for the strip.border.) Plot a parallel line ar, arkxpxq, that passes the pattern xr. Pick a pattern xs, xs ∈ c1, that is in between the two lines, ar and xpxq, and is the closest pattern to the line ar. Plot a parallel line, as, askxpxq, that passes the pattern xs. (Note that this xsmay not exist. We will use the xpxq as the as line.) Plot a decision border line, ars, right in between the two parallel lines, arand as. arshas wide margin between the pair (xr, xs).

The two patterns, xrand xs, serve as the margin-limiting stops of the region in

(3)

between the two lines, ar and as.

The decision border line buv for b can be accomplished in a similar way on the other side of xpxq. These two decision lines are used as the two neurons in the first hidden layer. All patterns in between the two lines arsand buv belong to the same class c1. These patterns will be substracted from the pattern set c1

and will not be used for the determination of all other strips. This is a divide- and-conquer scheme.

Note that the width between the two decision lines, arsand buv, is useful in the dertermination of the significance of the two neurons. Those neurons with large width are preferable and will be preserved with high priority in certain training operations. Small width neurons will be eliminated occasionally.

The patterns in between the two lines, ars and buv, are well isolated from the patterns in the other class c2. The stops xr and xs are different from the support vectors in SVM . The space in between the two parallel lines, ars and buv, is a sector of the D space. An example of the typical decision regions is illustrated in the Figure 1(b). The decision regions contain four bar-like strips.

There exists physiological evidences on receptive fields, D = 2, for the bar-like strips, [2, 3]. Note that there are many techniques to pick the center patterns xp and xq to build a strip. One way to do this is to select all patterns, {xp, xq, xr, xs, xu and xv}, in a predefined neighborhood region. The size of the neighborhood region can be tuned during the training process. One may include a penality cost to set the borders arsand buvin a way similar to that for SVM.

As for the general-position two-class classification problem, a single ‘AND’

(4)

function is used for a neuron in the second hidden layer, n2, to represent the patterns in one individual strip, see Figure 1(c). A global ‘OR’ function is used for the output neuron, n3, to represent all patterns in class c1 that are in all strips. To our knowlede, this is the simplest MLP architecture in many aspects.

Alternatively, one can fix the two hidden layers, n2 and n3, with ‘OR’ and

‘AND’ functions and use the BP to train the n1 layer only. In the n1 layer we set D + 1 neurons connected to each n2 neuron, or each ‘AND’ neuron. There are n1= n2(D + 1) neurons in the n1 layer. These D + 1 neurons can enclose an isolated polyhedron region (cell) and represent all patterns in that cell. All patterns in a single cell must belong to the same class after the training process.

In Figure 1(c) there are n1 = 2n2 in the first hidden layer and each strip is enclosed by two parallel hyperplanes.

This article shows a divide-and-conquer design. Both the number of neu- rons and the number of layers in the tiling algorithm [5] are highly sensitive to the setting of the origin, the absolute coordinates, of the patterns. The rela- tive distances between patterns are used in Chapter 3 and this material. This relative distance gives the classification quality. The network in Fig. 1(c) will give a perfect performance for any training datasets. To our knowledge, the performance of this network is comparable to that of SVM [1] for any testing datasets. Note that this network can be extended to an isolated polyhedral region that is encompassed by (D + 1) neorons. It is easy to extend the design to the multiple-class problem. It is easy to obtain all positive weights as those obtained by the NMF method.

(5)

Figure 1: The concept of the weight design method in [4].

(6)

References

[1] Boser, B.E., Guyon, I.M., and Vapnik, V.N.: A training algorithm for op- timal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992) 144—152

[2] Daugman, J.: Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research 20 (1980) 847—856

[3] Dobbins, A., Zucker, S.W., and Cynader, M.S.: Endstopped neurons in the visual cortex as a substrate for calculating curvature. Nature 329 (1987) 438—441

[4] Liou, C.-Y. and Yu, W.-J.: Initializing the weights in multilayer network with quadratic sigmoid function. In: Proceedings of the International Con- ference on Neural Information Processing (1994) 1387—1392

[5] M´ezard, M. and Nadal, J.P.: Learning in feedforward layered networks: the tiling algorithm. Journal of Physics A22 (1989) 2191—2203

[6] Mirchandini, G. and Cao, W.: On hidden nodes in neural nets. IEEE Trans- action Circuits and Systems 36 (1989) 661—664

參考文獻

相關文件

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Strategy 3: Offer descriptive feedback during the learning process (enabling strategy). Where the

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

Please create a timeline showing significant political, education, legal and social milestones for women of your favorite country.. Use the timeline template to record key dates

 Scrutiny of school documents including vision and mission, school development plan, annual school plan and school report..  Observation of lessons and school

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive