• 沒有找到結果。

Applied Deep Learning HW3

N/A
N/A
Protected

Academic year: 2022

Share "Applied Deep Learning HW3"

Copied!
34
0
0

加載中.... (立即查看全文)

全文

(1)

Applied Deep Learning HW3

Natural Language Generation

Deadline: 2022/05/16 23:59:59

(2)

Links

NTU COOL

Data & Evaluation 說明影片

adl-ta@csie.ntu.edu.tw TA Hours:

Wed. 14:00~15:30 @ Google Meet

Thu. 13:30~15:00 @ Google Meet

(3)

Change Logs

(4)

Task Description

(5)

Chinese News Summarization (Title Generation)

❖ input: news content

從小就很會念書的李悅寧, 在眾人殷殷期盼下

,以榜首之姿進入臺大醫學院, 但始終忘不了 對天文的熱情。大學四年級一場遠行後,她決心 遠赴法國攻讀天文博士。 從小沒想過當老師的 她,再度跌破眾人眼鏡返台任教, ...

❖ output: news title

榜首進台大醫科卻休學 、27歲拿到法國 天文博士 李悅寧跌破眾人眼鏡返台任教

(6)

Data

❖ Source: news articles scraped from udn.com

➢ Train: 21710 articles from 2015-03-02 to 2021-01-13

➢ Public: 5494 articles from 2021-01-14 to 2021-04-10

➢ Private: Not released and will include articles after deadline

(7)

Data (cont.)

❖ Example

(8)

Metrics

❖ ROUGE score with chinese word segmentation

➢ What is ROUGE score?

➢ Chinese word segmentation: ckiptagger(github)

❖ Example

➢ candiate: 我 是 人

➢ reference: 我 是 一 個 人

➢ rouge-1: precision=1.0, recall=0.6, f1=0.75

➢ rouge-2: precision=0.5, recall=0.25, f1=0.33

➢ rouge-L: precision=1.0, recall=0.6, f1=0.75

(9)

Objective

❖ Fine-tune a pre-trained small multilingual T5 model to pass the baselines

❖ Public baseline

➢ rouge-1: 22.0, rouge-2: 8.5, rouge-L: 20.5 (f1-score * 100)

❖ Private baseline

➢ Will be announced after deadline

(10)

Bonus: Applied RL on Summarization

Actions

States

r=0 r=0 r=0 ... r=ROUGE

t=0 t=1 t=2 ... t=n

(11)

Bonus: Applied RL on Summarization (cont.)

❖ You can use any RL algorithms (policy gradient, DQN and etc.)

❖ You can design your own reward function

➢ e.g. ROUGE-L, avg(ROUGE-N) and etc.

❖ You can either directly add RL loss while training or fine-tune

from a supervised-learning checkpoint

(12)

Report

(13)

Q1: Model (2%)

❖ Model (1%)

➢ Describe the model architecture and how it works on text summarization.

❖ Preprocessing (1%)

➢ Describe your preprocessing (e.g. tokenization, data cleaning and etc.)

(14)

Q2: Training (2%)

❖ Hyperparameter (1%)

➢ Describe your hyperparameter you use and how you decide it.

❖ Learning Curves (1%)

➢ Plot the learning curves (ROUGE versus training steps)

(15)

Q3: Generation Strategies(6%)

❖ Stratgies (2%)

➢ Describe the detail of the following generation strategies:

■ Greedy

■ Beam Search

■ Top-k Sampling

■ Top-p Sampling

■ Temperature

❖ Hyperparameters (4%)

➢ Try at least 2 settings of each strategies and compare the result.

➢ What is your final generation strategy? (you can combine any of them)

(16)

Bonus: Applied RL on Summarization (2%)

❖ Algorithm (1%)

➢ Describe your RL algorithms, reward function, and hyperparameters.

❖ Compare to Supervised Learning (1%)

➢ Observe the loss, ROUGE score and output texts, what differences can you find?

(17)

Rules

(18)

What You Can Do

❖ Allowed packages/tools:

➢ Python 3.8 / 3.9 and Python Standard Library

➢ PyTorch 1.7.1, TensorFlow 2.4.1, pytorch-lightning 1.2.3

➢ transformers, datasets, accelerate, sentencepiece

➢ rouge, spacy, nltk, ckiptagger, tqdm, pandas, jsonlines

➢ Dependencies of above packages/tools.

❖ If you want to use other package, mail TA.

❖ You can use any package you want when writing report.

(19)

What You Can NOT Do

❖ Use external training data

➢ E.g. scrape news from the internet

❖ Any means of cheating or plagiarism, including but not limited to:

➢ Use other classmates’ published / unpublished code.., including students who took previous ML / ADL / MLDS.

➢ Just copy and past any public available code without modification

➢ Use package or tools not allowed.

➢ Give/get trained model to/from others.

➢ Give/get report answers or plots to/from others.

➢ Publish your code before deadline.

❖ Violation may cause zero/negative score and punishment from school.

(20)

Logistics

(21)

Grading

❖ Model performance (10%)

➢ Public baseline (5%)

➢ Private baseline (5%)

❖ Report (10% + 2%)

➢ In PDF format!

➢ Score of each problem is shown in the Report section.

❖ Format

➢ You may lose (some or all) of your model performance score if your script is at wrong location, causes any error, etc.

(22)

Submission - Format

(23)

Submission - File Layout

❖ You are required to submit .zip file to NTU Cool

❖ File structure for the .zip file (case-sensitive):

➢ /[student id (lower-cased)]/ (Brackets not included.)

■ download.sh

■ run.sh

■ README.md

■ report.pdf

■ code/all other files you need

(24)

Submission - Scripts

download.sh

➢ Do not modify your file after deadline, or it will be seen as cheating.

Keep the URLs in download.sh valid for at least 2 weeks after deadline.

Do not do things more than downloading. Otherwise, your download.sh may be killed.

You can download at most 4G, and download.sh should finish within 1 hour. (At csie dept with maximum 10MB/s bandwidth)

❖ You can upload your model to Dropbox. (see tutorial)

❖ We will execute download.sh before predicting scripts.

(25)

Submission - Scripts

run.sh

❖ Arguments:

➢ ${1}: path to the input file

➢ ${2}: path to the output file

❖ TA will predict testing data as follow:

➢ bash ./download.sh

➢ bash ./run.sh /path/to/input.jsonl /path/to/output.jsonl

❖ Specify the Python version (3.8 or 3.9) in the .sh file.

➢ Default python version would be 3.8

➢ Ex. python3.8 predict.py … / python3.9 predict.py … “python” would be python3.8

Make sure your code works!

(26)

Submissiom - Reproducibility

❖ All the code you used to train, predict, plot figures for the report should should be upload.

❖ We will remove the answers in public.jsonl when we reproduce your submission.

❖ README.md

➢ Write down how to train your model with your code/script specifically.

➢ If necessary, you will be required to reproduce your results based on the README.md.

➢ If you cannot reproduce your result, you may lose points.

❖ You will get at least - 2 penalty if you have no or empty

README.md.

(27)

Execution Environment

❖ Will be run on computer with

➢ Ubuntu 20.04

➢ 32 GB RAM, GTX 3070 8G VRAM, 20G disk space available.

➢ the packages we allow only.

➢ python 3.8 / 3.9

❖ Do NOT train with very large model (e.g. mt5-xl) or you will get an out of memory error on 8G VRAM.

Time limit: 1 hours for run.sh in total

❖ You will lose (some or all) your model performance score

if your script is at wrong location, or cause any error.

(28)

Late Submission Penalty

❖ Late submission of "code and report":

➢ 0 day < late submission ≤ 1 day: original score * 0.95

➢ 1 day < late submission ≤ 3 day: original score * 0.90

➢ 3 day < late submission ≤ 4 day: original score * 0.75

➢ 4 day < late submission ≤ 5 day: original score * 0.50

➢ 5 day < late submission ≤ 6 day: original score * 0.25

➢ 6 day < late submission: original score * 0.00

❖ Late submission is determined by the last submission.

➢ Update your submission after deadline implies that you will get penalty.

(29)

Guide

(30)

Text-to-Text Transformer (T5)

Decoder Bi-Encoder

HW2: BERT HW3: T5

<input>

Hidden state

Bi-Encoder

Hidden state

<s>,y1,y2,y3

y1,y2,y3,</s>

<input>

<output>

Q,K,V Q

K,V Q,K,V

(31)

Training

❖ Pre-trained mt5-small is very large. (300M parameters, 3x than BERT-base)

❖ Some tips to reduce GPU memory usage:

➢ Reduce batch size + gradient accumulation

➢ Truncate text length (256/64 for input/output can pass the baseline)

➢ fp16 (transformers==4.5.0 has a bug on T5 fp16 training)

➢ adafactor (instead of Adam)

❖ For reference, you can pass the baseline within 4 hours

training on single RTX 3070 8G if your code is correct.

(32)

How to Fix T5 FP16 Training

● https://github.com/huggingface/transformers/pull/10956

● Install fixed version transformers library

○ git clone https://github.com/huggingface/transformers.git

○ git checkout t5-fp16-no-nans

○ pip install -e .

(33)

Documents

❖ T5

➢ https://huggingface.co/transformers/model_doc/t5.html

➢ https://huggingface.co/transformers/model_doc/mt5.html

❖ Generation:

➢ https://huggingface.co/transformers/main_classes/model.html#generatio n

(34)

Q&A

參考文獻

相關文件

For example, as a user of deep learning, you probably need to roughly know how it worksX. Otherwise you might now know what you are doing and what kinds of results you

develop students’ career-related competencies, foundation skills (notably communication skills), thinking skills and people skills as well as to nurture their positive values

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density

“information literacy” education at school recommended?. What is the suggested learning and teaching hours across

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams... The Learning Problem What is

○ Value function: how good is each state and/or action. ○ Policy: agent’s

Agent learns to take actions maximizing expected reward.. Machine Learning ≈ Looking for

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics