• 沒有找到結果。

# Applied Deep Learning HW3

N/A
N/A
Protected

Share "Applied Deep Learning HW3"

Copied!
34
0
0

(1)

(2)

(3)

(4)

(5)

### Chinese News Summarization (Title Generation)

❖ input: news content

，以榜首之姿進入臺大醫學院， 但始終忘不了 對天文的熱情。大學四年級一場遠行後，她決心 遠赴法國攻讀天文博士。 從小沒想過當老師的 她，再度跌破眾人眼鏡返台任教， ...

❖ output: news title

(6)

### ❖ Source: news articles scraped from udn.com

➢ Train: 21710 articles from 2015-03-02 to 2021-01-13

➢ Public: 5494 articles from 2021-01-14 to 2021-04-10

➢ Private: Not released and will include articles after deadline

(7)

(8)

### ❖ ROUGE score with chinese word segmentation

➢ What is ROUGE score?

➢ Chinese word segmentation: ckiptagger(github)

### ❖ Example

➢ candiate: 我 是 人

➢ reference: 我 是 一 個 人

➢ rouge-1: precision=1.0, recall=0.6, f1=0.75

➢ rouge-2: precision=0.5, recall=0.25, f1=0.33

➢ rouge-L: precision=1.0, recall=0.6, f1=0.75

(9)

### ❖ Public baseline

➢ rouge-1: 22.0, rouge-2: 8.5, rouge-L: 20.5 (f1-score * 100)

### ❖ Private baseline

➢ Will be announced after deadline

(10)

### Bonus: Applied RL on Summarization

Actions

States

r=0 r=0 r=0 ... r=ROUGE

t=0 t=1 t=2 ... t=n

(11)

(12)

(13)

### ❖ Model (1%)

➢ Describe the model architecture and how it works on text summarization.

### ❖ Preprocessing (1%)

➢ Describe your preprocessing (e.g. tokenization, data cleaning and etc.)

(14)

### ❖ Hyperparameter (1%)

➢ Describe your hyperparameter you use and how you decide it.

### ❖ Learning Curves (1%)

➢ Plot the learning curves (ROUGE versus training steps)

(15)

### ❖ Stratgies (2%)

➢ Describe the detail of the following generation strategies:

■ Greedy

■ Beam Search

■ Top-k Sampling

■ Top-p Sampling

■ Temperature

### ❖ Hyperparameters (4%)

➢ Try at least 2 settings of each strategies and compare the result.

➢ What is your final generation strategy? (you can combine any of them)

(16)

### ❖ Algorithm (1%)

➢ Describe your RL algorithms, reward function, and hyperparameters.

### ❖ Compare to Supervised Learning (1%)

➢ Observe the loss, ROUGE score and output texts, what differences can you find?

(17)

(18)

### ❖ Allowed packages/tools:

➢ Python 3.8 / 3.9 and Python Standard Library

➢ PyTorch 1.7.1, TensorFlow 2.4.1, pytorch-lightning 1.2.3

➢ transformers, datasets, accelerate, sentencepiece

➢ rouge, spacy, nltk, ckiptagger, tqdm, pandas, jsonlines

➢ Dependencies of above packages/tools.

(19)

### ❖ Any means of cheating or plagiarism, including but not limited to:

➢ Use other classmates’ published / unpublished code.., including students who took previous ML / ADL / MLDS.

➢ Just copy and past any public available code without modification

➢ Use package or tools not allowed.

➢ Give/get trained model to/from others.

➢ Give/get report answers or plots to/from others.

❖ Violation may cause zero/negative score and punishment from school.

(20)

(21)

### ❖ Model performance (10%)

➢ Public baseline (5%)

➢ Private baseline (5%)

### ❖ Report (10% + 2%)

➢ In PDF format!

➢ Score of each problem is shown in the Report section.

### ❖ Format

➢ You may lose (some or all) of your model performance score if your script is at wrong location, causes any error, etc.

(22)

(23)

### ❖ File structure for the .zip file (case-sensitive):

➢ /[student id (lower-cased)]/ (Brackets not included.)

■ run.sh

■ report.pdf

■ code/all other files you need

(24)

### Submission - Scripts

➢ Do not modify your file after deadline, or it will be seen as cheating.

(25)

### ❖ Arguments:

➢ \${1}: path to the input file

➢ \${2}: path to the output file

### ❖ TA will predict testing data as follow:

➢ bash ./run.sh /path/to/input.jsonl /path/to/output.jsonl

### ❖ Specify the Python version (3.8 or 3.9) in the .sh file.

➢ Default python version would be 3.8

➢ Ex. python3.8 predict.py … / python3.9 predict.py … “python” would be python3.8

(26)

### ❖ We will remove the answers in public.jsonl when we reproduce your submission.

➢ Write down how to train your model with your code/script specifically.

➢ If necessary, you will be required to reproduce your results based on the README.md.

➢ If you cannot reproduce your result, you may lose points.

(27)

### ❖ Will be run on computer with

➢ Ubuntu 20.04

➢ 32 GB RAM, GTX 3070 8G VRAM, 20G disk space available.

➢ the packages we allow only.

➢ python 3.8 / 3.9

(28)

### ❖ Late submission of "code and report":

➢ 0 day < late submission ≤ 1 day: original score * 0.95

➢ 1 day < late submission ≤ 3 day: original score * 0.90

➢ 3 day < late submission ≤ 4 day: original score * 0.75

➢ 4 day < late submission ≤ 5 day: original score * 0.50

➢ 5 day < late submission ≤ 6 day: original score * 0.25

➢ 6 day < late submission: original score * 0.00

### ❖ Late submission is determined by the last submission.

➢ Update your submission after deadline implies that you will get penalty.

(29)

(30)

### Text-to-Text Transformer (T5)

Decoder Bi-Encoder

HW2: BERT HW3: T5

<input>

Hidden state

Bi-Encoder

Hidden state

<s>,y1,y2,y3

y1,y2,y3,</s>

<input>

<output>

Q,K,V Q

K,V Q,K,V

(31)

### ❖ Some tips to reduce GPU memory usage:

➢ Reduce batch size + gradient accumulation

➢ Truncate text length (256/64 for input/output can pass the baseline)

➢ fp16 (transformers==4.5.0 has a bug on T5 fp16 training)

(32)

### ● Install fixed version transformers library

○ git clone https://github.com/huggingface/transformers.git

○ git checkout t5-fp16-no-nans

○ pip install -e .

(33)

### ❖ T5

➢ https://huggingface.co/transformers/model_doc/t5.html

➢ https://huggingface.co/transformers/model_doc/mt5.html

### ❖ Generation:

➢ https://huggingface.co/transformers/main_classes/model.html#generatio n

(34)

## Q&A

For example, as a user of deep learning, you probably need to roughly know how it worksX. Otherwise you might now know what you are doing and what kinds of results you

develop students’ career-related competencies, foundation skills (notably communication skills), thinking skills and people skills as well as to nurture their positive values

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density

“information literacy” education at school recommended？. What is the suggested learning and teaching hours across

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams... The Learning Problem What is

○ Value function: how good is each state and/or action. ○ Policy: agent’s

Agent learns to take actions maximizing expected reward.. Machine Learning ≈ Looking for

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics