• 沒有找到結果。

Chap. 4 MLP 重點

N/A
N/A
Protected

Academic year: 2022

Share "Chap. 4 MLP 重點"

Copied!
40
0
0

加載中.... (立即查看全文)

全文

(1)

Chap. 4 MLP 重點

Simplified structure of interconnected neurons

2-2-1 for 16 boolean fctns

No jump connection; no feedback connection

不等式 轉變成 邏輯 運作在輸入空間

High-level abstractions (representations) of the front input patterns

‘ linear algebra’s linear algebra

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(2)

2

It has been shown

that only one layer of hidden units sufices to approximate any function with finitely many discontinuities to arbitrary precision,

provided the activation functions of the hidden units are non-linear (the universal approximation theorem).

(Hornik, Stinchcombe, & White, 1989;

Funahashi, 1989; Cybenko, 1989; Hartman, Keeler, & Kowalski, 1990)

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(3)

3

Chapter 4: Hidden tree in MLP

Kolmogorov theorem

Existence theorem

He did not show how to implement it?

An operational solution is lack.

Even exists such sol, it will be too much complex to operate.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(4)
(5)
(6)

6

Hidden tree

Pruning neurons

1. two neurons both with the same or reverse responses to all patterns, 2. delete a neuron has a same

response to all patterns. Delete it.

3. delete useless neurons 分割同類 data 4 delete a neuron will not generate any mixed ambiguity cells. 簡化結構

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(7)

7

Day 6 / 17大進展 NetTalk 1987 wiki

Perceptron 1943

LMS learning = Widraw learning 1960

Webro 1974 (Harvard) Backpropagation

LMS 50Yr Celebration Presentation Paul Werbos part 1 (videos 2009/06/17 IJCNN)

1986 a “renaissance” in the field

Book by Rumelhart, Hinton and Williams

1986多層 neurons 學習 大幅增加計算能力

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(8)

8

NetTalk BP分配獎懲率各取所需 還是各取所值

BP 分配獎懲率各取所需 還是各取所值

BP : solve one type of credit assignment problem (舊的 homework)

Chemistry application prediction of disulfide binding state

http://www.uow.edu.au/~markus/teaching/C SCI323/Lecture_MLP.pdf

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(9)

9

Day 6 / 17 NetTalk

From where does its mysterious power come ?

None of the existing methods or contemporary technologies can

accomplish such pronunciation task 95% corrections.

Discovers 規則變化 (regularity 80%) 並 記住不規則變化

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(10)

10

NetTalk

Discovers 規則變化 (regularity 80%) 並記住 不規則變化

例外使得 rule based 方法失效

formal system 也失效

logic 失效

集合論失效

機率失效 統計失效

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(11)

11

NetTalk shows MLP by LMS

Autonomously exploits useful hidden structures in training dataset; such as vowels and consonants.

Discovers hidden structures (regularity 80%)

Utilizes those implicit structure + 記住特例

utilizes those structures to simplify the problem drastically.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(12)

12

Chapter 4 showed, suggested

世界是從底下自組(建築)起的 Boltzmann

World not imposed from the above God,

World merged from the low, (33.00min in video)

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(13)

13

Hidden tree

Coding cell =\ 不是 binary number

不是 Minsky 所說的 數

Neuron 有 MSB and LSB 性質

Neuron does not understand number.

Neurons use representations to solve problem.

(101)’ =\ 1X2^2+0X2^1+1X2^0=7

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(14)

14

Chap. 4: MLP tree

Over simplified neural structure No jump connections;

no feedback connections

Capable of high-level logical abstraction May be called a kind of

‘鑑別區分 machine’ or ‘分類 machine’

or ‘a self-tuning 分類 machine’

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(15)

15

Finest areas of the first hidden layer

are cells, 空間被 perceptrons 切到最細碎 每 一cell 只含純一類 data

or patches 多面體形狀 polyhedral

or building blocks of succeeding layers

Merge  merge  merge layer by layer

Coding  coding  coding 

Combining 結合  combining  combining

Piling  piling  piling 

Grouping  grouping  grouping 

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(16)

16

Finest areas of the first hidden layer

離散化 data space 的連續空間

簡化後續處理

Cell 被編上符號後 第二層與更深(上)層 hidden layers 只用前一層 cell 的符號去 training 不需用原始 data training 可簡化 後續處理

第二層與更深(上)層 hidden layers 持續 簡化 最終符號數量 會愈來愈少

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(17)

17

Each cell in 2-3-3-1 network

Each cell has a polyhedron shape and also a convex shape (first hidden layer)

Neighborhood cells are different in one bit, in one 區分 line

Total number # of cells for J hyper- planes in n dimensional space is

#(J,n)=Σ_{k=0}^{n} (J,k)

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(18)

18

Equivalent Isomorphism

第一層hidden layer 異質同構 等價結構

#(J,n)=Σ_{k=0}^{n}((j)/k) 的等價結構

Xun Dong

polyhedral complex

The bounded complex of a hyperplane arrangement

Xun Dong 的等價結構更複雜

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(19)

19

Hidden tree 反映 cells 的組合結構

There are so many rules among cells.

有各式各樣的編織方法編織這一群 cells.

There exist equivalent parts in the hidden tree structure.

This tree reveals all geometrical

relations of cells in hyperspace 看不到.

One can see the relations. 搬到眼前

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(20)

20

Statistics and probability ways

機率會使 cells 成為不同程度的灰色 cell 內 的 data 不可能為單純一類

MLP perceptrons 區隔不同純類 沒有灰色類

MLP 只處理 機率=0 機率=1 兩種類

不處理 機率=0.3 淺灰類

藉 MLP tree 可以挑出 ambiguity cells 機率

=0..3 的 cells

遇到有 機率=0..3 灰類 cells 時 針對個別 cell 另行加工

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(21)

21

Statistics and probability ways

如果非要加入機率假設

如孟德爾的雜交碗豆

淺灰色 <0.5 cells 與 深灰色 >0.5 cells 分別 結合在不同的子樹上 可以做到 global

minimum E.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(22)

22

Statistics and probability ways

Cells 分三種情況 分別處理 (之後有時間再講)

1. 純黑白 cells 如 Chap.4 內容

2. 純黑白 cells + 灰色機率 cells

先區分純黑白 cells 再處理灰色機率 cells

3. 純灰色機率 cells 有機會再講

淺灰色 <0.5 cells 與 深灰色 >0.5 cells 分別 結合在不同的子樹上 可以做到 global

minimum E.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(23)

23

Minsky

Neurons do not know 12 (數量) comes from,12=7+5? , see video 39.50 37.48

Neurons do not know numbers.

Neurons use representations to solve problem.

切割 input space 體重身高(數量)座標軸

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(24)

24

MLP tree & logic

Can express any Boolean fctn by Iterative logic fctn F1(F2(F3(X))) 邏輯 (and, or) 也需要 nest 運作

Tree = logic relations in spatial space.編織架構

Marriage 融合 logic and geometric relations

Logic may have no spatial content,

Conversely, space has no logic content

離散區割 X space into finest cells

and coding them

Combine coded cells into high-level codes

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(25)

25

MLP

Marriage of logic and geometric space 將空間編織上 Logic 搭 Logic 便車 利

用 Logic 輔助

MLP tree 約略可看出 data 空間結構

Piling  piling  piling 

Grouping  grouping  grouping 

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(26)

26

Hidden tree

Codes of areas are symbols.

Codes are not binary numbers.

Neurons develop symbols to solve problem.

Neurons do not understand numbers.

Neurons do not understand probability.

Neurons do not use probability to solve problem   Bayes.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(27)

27

MLP 將區割線轉變成 logic 內涵

Even more,

neurons do not understand logic.

Neurons use group force

(representations) collectively to solve 區分 分類 problems.

搭 Logic 便車 利用 Logic 輔助

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(28)

28

MLP tree: constructive way

MSB LSB neurons

Redundant neurons

Retrain locally and use local data

Divide and conquer

Constructive way

BP errors tend to get lost in front layers

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(29)

29

Tree nodes are representations

1-bit neighbors, 隔一條區別線,

Tree support 支撐 dataset 空間結構 spatial structure

相鄰近的兩個 data 可能隔兩條區別線 見前圖

Hamming dist.=2

2-bit neighbors,

1-bit neighbors’ neighbors

excluding itself and 1-bit neighbors

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(30)

30

Notes on BP and MLP

1. 分群 grouping of output vectors of first hidden layer by PCA 取代 binary tree in NetTalk 可用目視看出分群

2. Mark each sample in PCA with its error (深淺色),看出錯誤率大的資料

3.不同類用不同兩種顏色著色 (兩類)

4. 挑出相互矛盾的資料 samples 另行 設法處理

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(31)

31

Notes on BP and MLP

5.看出規則變化的特例 與 不規則變化

共同交集激發的那一組 neurons (majority), see NetTalk

6.依錯誤程度訂出 MSB LSB neurons 重新訓練 LSB neurons (MSB 不更動)

7. 調整距離輸入資料最近的幾個 hyper- planes 三點共面 四點共體

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(32)

32

Notes on BP and MLP

8. l weight l~~0 區別線平行該 weight 拒絕 該輸入X

如果整層的 特定 l weight l~~0 可能由於該輸 入是 noise 或 無關量 如病況與石頭的顏色

如果整層的 所有 l weight l~~0 代表 don’t care 拒絕輸入資料 資料飽含矛盾 ( BP 為降 低 MSE 才造成 ~~0 現象) 應檢查輸入資料 有時整層都~~0 堵住輸入資料

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(33)

33

Notes on BP and MLP

8. 找出 l weight l~~0 的所有 neurons

l weight l~~0 代表 don’t care 輸入資料資料 有矛盾 ( BP 為降低 MSE 才造成 ~~0 現象) 應檢查輸入資料 有時整層都~~0 堵住輸入資 料

9. 每一sample標上 各層的 label codes

10. 用 hard-limit 去算每一 sample 的 error error=0 的 sample BP不修正

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(34)

34

MLP 經驗法則

Practically, n_1>>n_2>>n_3

The number of neurons in the first

hidden layer is much larger than that in the second hidden layer.

The number of neurons in the first

hidden layer is estimated n_1=2n+1 . Komogorov theory 1957 Poggio(MIT)

Nash’s Embedding Theorem on 2n+1

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(35)

35

Conclusions of hidden tree

The most important conclusion of Chapter 4 on hidden tree is:

“To get perfect performance (100%

correction; global solution) on training

dataset, the MLP must be accomplished in a bottom-up manner.”

Any BP algorithm will converge to a local minimum solution. BP errors will get lost in front layers.

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(36)

36

Chapter 4

LMS

Front layers (下樑 input)

Rear layers or deep layers (上樑 output)

上樑 provides logical (representation) track

NetTalk learns

80%regular rules + 20% irregualr cases

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(37)

37

LMS by Widrow

Minimizing probability expectation E(error**2) is a wrong direction. 應從原始 raw 著手 不必追求機率的完美

Record errors for each pattern and for each neuron. One can develop various

manipulations 策略 for the training sequence during training.

Tune weights for large errors with priority.

There is no need to introduce the

assumption “stationary, ….” E

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(38)

38

舊的 Homework #1 Google cancer

Write BP program for 2-3-3-1 network + online hidden tree.

NetTalk updating eq. for w(t+1)

Show how BP distributes corrections of error to each neuron. BP分配獎懲率, see NetTalk

Record the 1-bit neighbors + 2-bit neighbors Hidden representations

+ hidden tree by Sejnowski

Generate artificial data set or use real dataset MSB & LSB neurons + pruning

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(39)

39

MLP & math

Widrow adaptive

Kolmogorov learning NN theory

Kolmogorov 泛化 大陸

MIT open course ware

Kolmogorov space filling 2n+1

Kolmogorov Fr

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

(40)

40

Komogorov theory 1957

Debates

Kolmogorov's Theorem Is irrelevant

An exact representation is hopeless

Kolmogorov's Theorem Is Relevant

CSIE 5052 922 U1180 neural networks by C.-Y. Liou

參考文獻

相關文件

„ 傳統上市場上所採取集群分析方法,多 為「硬分類(Crisp partition)」,本研 究採用模糊集群鋰論來解決傳統的分群

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

The space of total positive matrices mod GL(n) is finite But the number of Ising networks is infinite. Ising networks are secretly dual to each other though local duality

「光滑的」邊界 C。現考慮相鄰的 兩個多邊形的線積分,由於共用邊 的方向是相反的,所以相鄰兩個多

單元一:上學 圖畫書 單元二:泛愛 童詩 小二、小三 單元三:四季 童詩 單元四:友情 童話 小三、小四 單元五:謙遜 寓言 單元六:創意思維 童話 小四、小五

Each unit in hidden layer receives only a portion of total errors and these errors then feedback to the input layer.. Go to step 4 until the error is

印出 Optimal Binary Search