Chap. 4 MLP 重點

(1)

Chap. 4 MLP 重點

 Simplified structure of interconnected neurons

 2-2-1 for 16 boolean fctns

 No jump connection; no feedback connection

 不等式轉變成邏輯運作在輸入空間

 High-level abstractions (representations) of the front input patterns

‘ linear algebra’s linear algebra

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(2)

2



It has been shown

 that only one layer of hidden units sufices to approximate any function with finitely many discontinuities to arbitrary precision,

provided the activation functions of the hidden units are non-linear (the universal approximation theorem).

 (Hornik, Stinchcombe, & White, 1989;

Funahashi, 1989; Cybenko, 1989; Hartman, Keeler, & Kowalski, 1990)

(3)

3

Chapter 4: Hidden tree in MLP



Kolmogorov theorem



Existence theorem



He did not show how to implement it?



An operational solution is lack.



Even exists such sol, it will be too much complex to operate.

(4)

(5)

(6)

6

Hidden tree



Pruning neurons

1. two neurons both with the same or reverse responses to all patterns, 2. delete a neuron has a same

response to all patterns. Delete it.

3. delete useless neurons 分割同類 data 4 delete a neuron will not generate any mixed ambiguity cells. 簡化結構

(7)

7

Day 6 / 17大進展 NetTalk 1987 wiki

 Perceptron 1943

 LMS learning = Widraw learning 1960

 Webro 1974 (Harvard) Backpropagation

LMS 50Yr Celebration Presentation Paul Werbos part 1 (videos 2009/06/17 IJCNN)

 1986 a “renaissance” in the field

Book by Rumelhart, Hinton and Williams

1986多層 neurons 學習大幅增加計算能力

(8)

8

NetTalk BP分配獎懲率各取所需還是各取所值

 BP 分配獎懲率各取所需還是各取所值

 BP : solve one type of credit assignment problem (舊的 homework)

 Chemistry application prediction of disulfide binding state

 http://www.uow.edu.au/~markus/teaching/C SCI323/Lecture_MLP.pdf

(9)

9

Day 6 / 17 NetTalk



From where does its mysterious power come ?



None of the existing methods or contemporary technologies can

accomplish such pronunciation task 95% corrections.



Discovers 規則變化 (regularity 80%) 並記住不規則變化

(10)

10

NetTalk

 Discovers 規則變化 (regularity 80%) 並記住不規則變化

 例外使得 rule based 方法失效

 formal system 也失效

 logic 失效

 集合論失效

 機率失效統計失效

(11)

11

NetTalk shows MLP by LMS

 Autonomously exploits useful hidden structures in training dataset; such as vowels and consonants.

 Discovers hidden structures (regularity 80%)

 Utilizes those implicit structure + 記住特例

 utilizes those structures to simplify the problem drastically.

(12)

12

Chapter 4 showed, suggested

 世界是從底下自組(建築)起的 Boltzmann



 World not imposed from the above God,

 World merged from the low, (33.00min in video)

(13)

13

Hidden tree



Coding cell =\ 不是 binary number



不是 Minsky 所說的數



Neuron 有 MSB and LSB 性質



Neuron does not understand number.



Neurons use representations to solve problem.

(101)’ =\ 1X2^2+0X2^1+1X2^0=7

(14)

14

Chap. 4: MLP tree



Over simplified neural structure No jump connections;

no feedback connections

Capable of high-level logical abstraction May be called a kind of

‘鑑別區分 machine’ or ‘分類 machine’

or ‘a self-tuning 分類 machine’

(15)

15

Finest areas of the first hidden layer

 are cells, 空間被 perceptrons 切到最細碎每一cell 只含純一類 data

 or patches 多面體形狀 polyhedral

 or building blocks of succeeding layers

 Merge  merge  merge layer by layer

 Coding  coding  coding 

 Combining 結合  combining  combining

 Piling  piling  piling 

 Grouping  grouping  grouping 

(16)

16

Finest areas of the first hidden layer



離散化 data space 的連續空間



簡化後續處理



Cell 被編上符號後第二層與更深(上)層 hidden layers 只用前一層 cell 的符號去 training 不需用原始 data training 可簡化後續處理



第二層與更深(上)層 hidden layers 持續簡化最終符號數量會愈來愈少

(17)

17

Each cell in 2-3-3-1 network



Each cell has a polyhedron shape and also a convex shape (first hidden layer)



Neighborhood cells are different in one bit, in one 區分 line



Total number # of cells for J hyper- planes in n dimensional space is

#(J,n)=Σ_{k=0}^{n} (J,k)

(18)

18

Equivalent Isomorphism

 第一層hidden layer 異質同構等價結構

 #(J,n)=Σ_{k=0}^{n}((j)/k) 的等價結構

 Xun Dong

 polyhedral complex

 The bounded complex of a hyperplane arrangement

 Xun Dong 的等價結構更複雜

(19)

19

Hidden tree 反映 cells 的組合結構



There are so many rules among cells.

有各式各樣的編織方法編織這一群 cells.



There exist equivalent parts in the hidden tree structure.



This tree reveals all geometrical

relations of cells in hyperspace 看不到.



One can see the relations. 搬到眼前

(20)

20

Statistics and probability ways

 機率會使 cells 成為不同程度的灰色 cell 內的 data 不可能為單純一類

 MLP perceptrons 區隔不同純類沒有灰色類

 MLP 只處理機率=0 機率=1 兩種類

 不處理機率=0.3 淺灰類

 藉 MLP tree 可以挑出 ambiguity cells 機率

=0..3 的 cells

 遇到有機率=0..3 灰類 cells 時針對個別 cell 另行加工

(21)

21

 如果非要加入機率假設

 如孟德爾的雜交碗豆

 淺灰色 <0.5 cells 與深灰色 >0.5 cells 分別結合在不同的子樹上可以做到 global

minimum E.

(22)

22

 Cells 分三種情況分別處理 (之後有時間再講)

 1. 純黑白 cells 如 Chap.4 內容

 2. 純黑白 cells + 灰色機率 cells

 先區分純黑白 cells 再處理灰色機率 cells

 3. 純灰色機率 cells 有機會再講

 淺灰色 <0.5 cells 與深灰色 >0.5 cells 分別結合在不同的子樹上可以做到 global

minimum E.

(23)

23

Minsky



Neurons do not know 12 (數量) comes from,12=7+5? , see video 39.50 37.48



Neurons do not know numbers.



Neurons use representations to solve problem.



切割 input space 體重身高(數量)座標軸

(24)

24

MLP tree & logic

Can express any Boolean fctn by Iterative logic fctn F1(F2(F3(X))) 邏輯 (and, or) 也需要 nest 運作

 Tree = logic relations in spatial space.編織架構

 Marriage 融合 logic and geometric relations

 Logic may have no spatial content,

 Conversely, space has no logic content

 離散區割 X space into finest cells

 and coding them

 Combine coded cells into high-level codes

(25)

25

MLP



Marriage of logic and geometric space 將空間編織上 Logic 搭 Logic 便車利

用 Logic 輔助



MLP tree 約略可看出 data 空間結構



Piling  piling  piling 



Grouping  grouping  grouping 

(26)

26

Hidden tree



Codes of areas are symbols.



Codes are not binary numbers.



Neurons develop symbols to solve problem.



Neurons do not understand numbers.



Neurons do not understand probability.



Neurons do not use probability to solve problem   Bayes.

(27)

27

MLP 將區割線轉變成 logic 內涵



Even more,

neurons do not understand logic.



Neurons use group force

(representations) collectively to solve 區分分類 problems.



搭 Logic 便車利用 Logic 輔助

(28)

28

MLP tree: constructive way



MSB LSB neurons



Redundant neurons



Retrain locally and use local data



Divide and conquer



Constructive way



BP errors tend to get lost in front layers

(29)

29

Tree nodes are representations

 1-bit neighbors, 隔一條區別線,

 Tree support 支撐 dataset 空間結構 spatial structure

 相鄰近的兩個 data 可能隔兩條區別線見前圖

 Hamming dist.=2

 2-bit neighbors,

1-bit neighbors’ neighbors

excluding itself and 1-bit neighbors

(30)

30

Notes on BP and MLP



1. 分群 grouping of output vectors of first hidden layer by PCA 取代 binary tree in NetTalk 可用目視看出分群



2. Mark each sample in PCA with its error (深淺色),看出錯誤率大的資料



3.不同類用不同兩種顏色著色 (兩類)



4. 挑出相互矛盾的資料 samples 另行設法處理

(31)

31

Notes on BP and MLP



5.看出規則變化的特例與不規則變化

共同交集激發的那一組 neurons (majority), see NetTalk



6.依錯誤程度訂出 MSB LSB neurons 重新訓練 LSB neurons (MSB 不更動)



7. 調整距離輸入資料最近的幾個 hyper- planes 三點共面四點共體

(32)

32

Notes on BP and MLP

 8. l weight l~~0 區別線平行該 weight 拒絕該輸入X

 如果整層的特定 l weight l~~0 可能由於該輸入是 noise 或無關量如病況與石頭的顏色

 如果整層的所有 l weight l~~0 代表 don’t care 拒絕輸入資料資料飽含矛盾 ( BP 為降低 MSE 才造成 ~~0 現象) 應檢查輸入資料有時整層都~~0 堵住輸入資料

(33)

33

Notes on BP and MLP

 8. 找出 l weight l~~0 的所有 neurons

l weight l~~0 代表 don’t care 輸入資料資料有矛盾 ( BP 為降低 MSE 才造成 ~~0 現象) 應檢查輸入資料有時整層都~~0 堵住輸入資料

 9. 每一sample標上各層的 label codes

 10. 用 hard-limit 去算每一 sample 的 error error=0 的 sample BP不修正

(34)

34

MLP 經驗法則



Practically, n_1>>n_2>>n_3



The number of neurons in the first

hidden layer is much larger than that in the second hidden layer.



The number of neurons in the first

hidden layer is estimated n_1=2n+1 . Komogorov theory 1957 Poggio(MIT)

 Nash’s Embedding Theorem on 2n+1

(35)

35

Conclusions of hidden tree

 The most important conclusion of Chapter 4 on hidden tree is:

“To get perfect performance (100%

correction; global solution) on training

dataset, the MLP must be accomplished in a bottom-up manner.”

Any BP algorithm will converge to a local minimum solution. BP errors will get lost in front layers.

(36)

36

Chapter 4

 LMS

 Front layers (下樑 input)

 Rear layers or deep layers (上樑 output)

 上樑 provides logical (representation) track

 NetTalk learns

 80%regular rules + 20% irregualr cases

(37)

37

LMS by Widrow

Minimizing probability expectation E(error**2) is a wrong direction. 應從原始 raw 著手不必追求機率的完美

 Record errors for each pattern and for each neuron. One can develop various

manipulations 策略 for the training sequence during training.

 Tune weights for large errors with priority.

 There is no need to introduce the

 assumption “stationary, ….” E

(38)

38

舊的 Homework #1 Google cancer

Write BP program for 2-3-3-1 network + online hidden tree.

NetTalk updating eq. for w(t+1)

Show how BP distributes corrections of error to each neuron. BP分配獎懲率, see NetTalk

Record the 1-bit neighbors + 2-bit neighbors Hidden representations

+ hidden tree by Sejnowski

Generate artificial data set or use real dataset MSB & LSB neurons + pruning

(39)

39

MLP & math



Widrow adaptive



Kolmogorov learning NN theory



Kolmogorov 泛化大陸



MIT open course ware



Kolmogorov space filling 2n+1



Kolmogorov Fr

(40)

40

Komogorov theory 1957



Debates



Kolmogorov's Theorem Is irrelevant



An exact representation is hopeless



Chap. 4 MLP 重點

It has been shown

Kolmogorov theorem

Existence theorem

He did not show how to implement it?

An operational solution is lack.

Even exists such sol, it will be too much complex to operate.

Pruning neurons

1. two neurons both with the same or reverse responses to all patterns, 2. delete a neuron has a same

response to all patterns. Delete it.

3. delete useless neurons 分割同類 data 4 delete a neuron will not generate any mixed ambiguity cells. 簡化結構

From where does its mysterious power come ?

None of the existing methods or contemporary technologies can

accomplish such pronunciation task 95% corrections.

Discovers 規則變化 (regularity 80%) 並 記住不規則變化

NetTalk

Coding cell =\ 不是 binary number

不是 Minsky 所說的 數

Neuron 有 MSB and LSB 性質

Neuron does not understand number.

Neurons use representations to solve problem.

(101)’ =\ 1X2^2+0X2^1+1X2^0=7

Over simplified neural structure No jump connections;

no feedback connections

Capable of high-level logical abstraction May be called a kind of

‘鑑別區分 machine’ or ‘分類 machine’

or ‘a self-tuning 分類 machine’

離散化 data space 的連續空間

簡化後續處理

Cell 被編上符號後 第二層與更深(上)層 hidden layers 只用前一層 cell 的符號去 training 不需用原始 data training 可簡化 後續處理

第二層與更深(上)層 hidden layers 持續 簡化 最終符號數量 會愈來愈少

Each cell has a polyhedron shape and also a convex shape (first hidden layer)

Neighborhood cells are different in one bit, in one 區分 line

Total number # of cells for J hyper- planes in n dimensional space is

#(J,n)=Σ_{k=0}^{n} (J,k)

There are so many rules among cells.

有各式各樣的編織方法編織這一群 cells.

There exist equivalent parts in the hidden tree structure.

This tree reveals all geometrical

relations of cells in hyperspace 看不到.

One can see the relations. 搬到眼前

Neurons do not know 12 (數量) comes from,12=7+5? , see video 39.50 37.48

Neurons do not know numbers.

Neurons use representations to solve problem.

切割 input space 體重身高(數量)座標軸

Marriage of logic and geometric space 將空間編織上 Logic 搭 Logic 便車 利

用 Logic 輔助

MLP tree 約略可看出 data 空間結構

Piling  piling  piling 

Grouping  grouping  grouping 

Codes of areas are symbols.

Codes are not binary numbers.

Neurons develop symbols to solve problem.

Neurons do not understand numbers.

Neurons do not understand probability.

Neurons do not use probability to solve problem   Bayes.

Even more,

neurons do not understand logic.

Neurons use group force

(representations) collectively to solve 區分 分類 problems.

搭 Logic 便車 利用 Logic 輔助

MSB LSB neurons

Redundant neurons

Retrain locally and use local data

Divide and conquer

Constructive way

BP errors tend to get lost in front layers

1. 分群 grouping of output vectors of first hidden layer by PCA 取代 binary tree in NetTalk 可用目視看出分群

2. Mark each sample in PCA with its error (深淺色),看出錯誤率大的資料

3.不同類用不同兩種顏色著色 (兩類)

4. 挑出相互矛盾的資料 samples 另行 設法處理

5.看出規則變化的特例 與 不規則變化

共同交集激發的那一組 neurons (majority), see NetTalk

6.依錯誤程度訂出 MSB LSB neurons 重新訓練 LSB neurons (MSB 不更動)

7. 調整距離輸入資料最近的幾個 hyper- planes 三點共面 四點共體

Practically, n_1>>n_2>>n_3

The number of neurons in the first

hidden layer is much larger than that in the second hidden layer.

The number of neurons in the first

hidden layer is estimated n_1=2n+1 . Komogorov theory 1957 Poggio(MIT)

Discovers 規則變化 (regularity 80%) 並記住不規則變化

不是 Minsky 所說的數

Cell 被編上符號後第二層與更深(上)層 hidden layers 只用前一層 cell 的符號去 training 不需用原始 data training 可簡化後續處理

第二層與更深(上)層 hidden layers 持續簡化最終符號數量會愈來愈少

Marriage of logic and geometric space 將空間編織上 Logic 搭 Logic 便車利

(representations) collectively to solve 區分分類 problems.

搭 Logic 便車利用 Logic 輔助

4. 挑出相互矛盾的資料 samples 另行設法處理

5.看出規則變化的特例與不規則變化

7. 調整距離輸入資料最近的幾個 hyper- planes 三點共面四點共體

Kolmogorov 泛化大陸