• 沒有找到結果。

Introduction Applied Deep Learning

N/A
N/A
Protected

Academic year: 2022

Share "Introduction Applied Deep Learning"

Copied!
43
0
0

加載中.... (立即查看全文)

全文

(1)

Introduction

Applied Deep Learning

February 14nd, 2022 http://adl.miulab.tw

(2)

什麼是機器學習?

白話文讓你了解!

What is Machine Learning?

2

(3)

What Computers Can Do?

Programs can do the things you ask them to do

3

(4)

Program for Solving Tasks

Task: predicting positive or negative given a product review

4

“I love this product!” “It claims too much.” “It’s a little expensive.”

+ - ?

“台灣第一波上市!” “規格好雞肋…” “樓下買了我才考慮”

?

program.py

Some tasks are complex, and we don’t know how to write a program to solve them.

if input contains “love”, “like”, etc.

output = positive

if input contains “too much”, “bad”, etc.

output = negative

program.py program.py

program.py program.py program.py

(5)

“I love this product!” “It claims too much.” “It’s a little expensive.”

+ - ?

“台灣第一波上市!” “規格好雞肋…” “樓下買了我才考慮”

?

if input contains “love”, “like”, etc.

output = positive

if input contains “too much”, “bad”, etc.

output = negative

Learning ≈ Looking for a Function

Task: predicting positive or negative given a product review

5

f f f

f f f

Given a large amount of data, the machine learns what the function f should be.

(6)

Learning ≈ Looking for a Function

◉ Speech Recognition

◉ Handwritten Recognition

◉ Weather forecast

◉ Play video games

6

(

)

= f

(

)

=

f

(

)

=

f

(

)

=

f

“2”

“你好”

“ Saturday”

“move left”

Thursday

(7)

Machine Learning Framework

7

Training is to pick the best function given the observed data Testing is to predict the label using the learned function Training Data

Model: Hypothesis Function Set

2

1

, f f

Training: Pick the best function f*

Testing: f

( )

x = y y = +

f *

“Best” Function

( ) ( )

x1,yˆ1 , x2,yˆ2 ,

Testing Data

( )

x,? ,

“It claims too much.”

-

(negative)

:

x ˆy :

function input function output

(8)

什麼是深度學習?

What is Deep Learning?

8

A subfield of machine learning

(9)

Stacked Functions Learned by Machine

◉ Production line (生產線)

9

“台灣第一波上市!”

End-to-end training: what each function should do is learned automatically

Simple Function f1

Simple Function f2

Simple Function f3

Deep Learning Model

f: a very complex function

Deep learning usually refers to neural network based model

(10)

Stacked Functions Learned by Machine

10

Output Layer Hidden Layers

Input Layer

Layer 1 Layer 2 Layer L

Input Output

vector x

“台灣第一波 上市!”

label yx1

x2

… …

xN

… … … … … …

… …

y

Features / Representations

Representation Learning attempts to learn good features/representations

Deep Learning attempts to learn (multiple levels of) representations and an output

(11)

Deep v.s. Shallow – Speech Recognition

◉ Shallow Model

11

MFCC Waveform

Filter bank DFT

DCT

log

GMM

spectrogram

“Hello”

:hand-crafted :learned from data

Each box is a simple function in the production line:

(12)

Deep v.s. Shallow – Speech Recognition

Deep Model

12

All functions are learned from data

Less engineering labor, but machine learns more

f

1

“Hello”

f

2

f

3

f

4

f

5

“Bye bye, MFCC” - Deng Li in Interspeech 2014

(13)

Deep v.s. Shallow – Image Recognition

◉ Shallow Model

13

:hand-crafted :learned from data

http://www.robots.ox.ac.uk/~vgg/research/encoding_eval/

(14)

Deep v.s. Shallow – Image Recognition

Deep Model

14

Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)

“monkey”

f

1

f

2

f

3

f

4

All functions are learned from data

Features / Representations

(15)

Machine Learning v.s. Deep Learning

15

Credit by Dr. Socher

Machine Learning

describing your data with features a computer can understand

model learning algorithm

hand-crafted domain- specific knowledge

optimizing the weights on features

(16)

Machine Learning v.s. Deep Learning

16

Deep learning usually refers to neural network based model

Deep Learning

representations learned by machine

model learning algorithm

automatically learned internal knowledge

optimizing the weights on features

(17)

Inspired by Human Brain

17

(18)

A Single Neuron

18

z

w

1

w

2

w

N

x

1

x

2

x

N

+ b

( ) z

( ) z

z

bias

y

( )

z

z e

= + 1

 1

Sigmoid function Activation

function

Each neuron is a very simple function

(19)

Deep Neural Network

◉ Cascading the neurons to form a neural network

19

x1

x2

… …

Layer 1

… …

y1

y2

… …

Layer 2

… …

Layer L

… …

Input Output

yM

xN

M

N

R

R

f : →

A neural network is a complex function:

Each layer is a simple function in the production line

(20)

History of Deep Learning

1960s: Perceptron (single layer neural network)

1969: Perceptron has limitation

1980s: Multi-layer perceptron

1986: Backpropagation

1989: 1 hidden layer is “good enough”, why deep?

2006: RBM initialization (breakthrough)

2009: GPU

2010: breakthrough in Speech Recognition (Dahl et al., 2010)

2012: breakthrough in ImageNet (Krizhevsky et al. 2012)

2015: “superhuman” results in Image and Speech Recognition

20

(21)

Deep Learning Breakthrough

◉ First: Speech Recognition

◉ Second: Computer Vision

21

Acoustic Model WER on RT03S FSH WER on Hub5 SWB

Traditional Features 27.4% 23.6%

Deep Learning 18.5% (-33%) 16.1% (-32%)

(22)

History of Deep Learning

1960s: Perceptron (single layer neural network)

1969: Perceptron has limitation

1980s: Multi-layer perceptron

1986: Backpropagation

1989: 1 hidden layer is “good enough”, why deep?

2006: RBM initialization (breakthrough)

2009: GPU

2010: breakthrough in Speech Recognition (Dahl et al., 2010)

2012: breakthrough in ImageNet (Krizhevsky et al. 2012)

2015: “superhuman” results in Image and Speech Recognition

22

Why does deep learning show breakthrough in applications after 2010?

(23)

Reasons why Deep Learning works

Big Data

23

GPU

(24)

Why to Adopt GPU for Deep Learning?

◉ GPU is like a brain

Human brains create graphical imagination for mental thinking

24

台灣好吃牛肉麵

(25)

Why Speed Matters?

◉ Training time

Big data increases the training time

Too long training time is not practical

◉ Inference time

Users are not patient to wait for the responses 25

0 50 100 150 200 250 300

P40 P4 CPU

Time (ms)

GPU enables the real-world applications using the computational power

(26)

Why Deeper is Better?

◉ Deeper → More parameters

26

x1 x2

……

xN

……

x1 x2

……

xN

……

……

Shallow Deep

(27)

Universality Theorem

Any continuous function f

can be realized by a network with only hidden layer

27

: R R

M

f

N

http://neuralnetworksanddeeplearning.com/chap4.html

Why “deep” not “fat”?

(28)

Fat + Shallow v.s. Thin + Deep

◉ Two networks with the same number of parameters

28

x1 x2

……

xN

……

……

x1 x2

……

xN

……

(29)

Fat + Shallow v.s. Thin + Deep

Hand-Written Digit Classification

29

The deeper model uses less parameters to achieve the same performance

(30)

Fat + Shallow v.s. Thin + Deep

◉ Two networks with the same number of parameters

30

x1 x2

……

xN

……

……

x1 x2

……

xN

……

2d d

d

O(2d) O(d

2

)

(31)

如何應用深度學習?

How to Apply?

31

(32)

How to Frame the Learning Problem?

The learning algorithm f is to map the input domain X into the output domain Y

Input domain: word, word sequence, audio signal, click logs

Output domain: single label, sequence tags, tree structure, probability distribution

32

Y X

f : →

(33)

Output Domain – Classification

◉ Sentiment Analysis

◉ Speech Phoneme Recognition

◉ Handwritten Recognition

33

“這規格有誠意!”

+

“太爛了吧~”

-

/h/

2

(34)

Output Domain – Sequence Prediction

◉ POS Tagging

◉ Speech Recognition

◉ Machine Translation

34

“推薦我台大後門的餐廳” 推薦/VV 我/PN 台大/NR 後門/NN

的/DEG 餐廳/NN

“大家好”

“How are you doing today?” “你好嗎?”

Learning tasks are decided by the output domains

(35)

Input Domain – How to Aggregate Information

Input: word sequence, image pixels, audio signal, click logs

Property: continuity, temporal, importance distribution

Example

CNN (convolutional neural network): local connections, shared weights, pooling

AlexNet, VGGNet, etc.

RNN (recurrent neural network): temporal information 35

Network architectures should consider the input domain properties

(36)

How to Frame the Learning Problem?

The learning algorithm f is to map the input domain X into the output domain Y

Input domain: word, word sequence, audio signal, click logs

Output domain: single label, sequence tags, tree structure, probability distribution

36

Y X

f : →

Network design should leverage input and output domain properties

(37)

“Applied” Deep Learning

37

Deep Learning

representations learned by machine

model learning algorithm

automatically learned internal knowledge

optimizing the weights on features

How to frame a task into a learning problem and design the corresponding model

(38)

Core Factors for Applied Deep Learning

1.

Data: big data

2.

Hardware: GPU computing

3.

Talent: design algorithms to allow networks to work for the specific problems

38

(39)

Concluding Remarks

39

Training

(40)

Concluding Remarks

40

Inference

(41)

Concluding Remarks

41

Training

Inference

Main focus: how to apply deep learning to the real-world problems

(42)

Reference

◉ Reading Materials

Academic papers will be put in the website

◉ Deep Learning

Goodfellow, Bengio, and Courville, “Deep Learning,” 2016.

http://www.deeplearningbook.org

Michael Nielsen, “Neural Networks and Deep Learning”

http://neuralnetworksanddeeplearning.com 42

(43)

Any questions ?

You can find the course information at

◉ http://adl.miulab.tw

◉ adl-ta@csie.ntu.edu.tw

◉ slido: #ADL2022

◉ YouTube: Vivian NTU MiuLab

Thanks!

43

參考文獻

相關文件

 End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)?.  No specific goal, focus on

A novel surrogate able to adapt to any given MLL criterion The first cost-sensitive multi-label learning deep model The proposed model successfully. Tackle general

Wang (2006), Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network, IEEE Trans- actions on Neural Networks,

Wang, Solving pseudomonotone variational inequalities and pseudo- convex optimization problems using the projection neural network, IEEE Transactions on Neural Network,

B3-4 DEEP LEARNING MODEL COMPRESSION BY NETWORK SLIMMING Ching-Hao Wang (王敬豪), Shih-Che Chien (簡士哲), Feng-Chia Chang (張峰嘉), and Wen-Huang Cheng (鄭文皇). B3-5

Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks. § Goal – Devise effective and efficient learning methods for scalable visual analytic

○ Value function: how good is each state and/or action. ○ Policy: agent’s

Agent learns to take actions maximizing expected reward.. Machine Learning ≈ Looking for