• 沒有找到結果。

Task-Oriented Dialogue System

N/A
N/A
Protected

Academic year: 2022

Share "Task-Oriented Dialogue System"

Copied!
10
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

2

Task-Oriented Dialogue System

(Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST) Natural Language

Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend Text Input

Are there any action movies to see this weekend?

Speech Signal

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(3)

3

Speech Recognition / Multimodality

Speech recognition

Word error rate

Word accuracy

Emotion recognition

Accuracy

3

Hyp: A AB D C K Ref: A C D A C

#words in the reference

(4)

4

Language Understanding Evaluation

Data

Training and testing should be split

Testing data should be real data collected from human to make evaluation results convincing

Metrics

Sub-sentence-level: intent accuracy, slot F1

Sentence-level: whole frame accuracy

(5)

5

Dialogue State Tracking Evaluation

Metric

Tracked state accuracy with respect to user goal

Recall/Precision/F-measure individual slots

5

(6)

6

Dialogue Policy Evaluation

Metrics

Turn-level evaluation: system action accuracy

Dialogue-level evaluation: task success rate, reward,

#dialogue turn

(7)

7

Reinforcement Learning Policy

Frame-level semantics

7

If your RL agent cannot outperform the rule-based agent, please consider to increase the complexity of system functionality and the simulated user.

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

Natural language

Note: check whether the interactions can be satisfied by the system’s functionality

(8)

8

Natural Language Generation Evaluation

Metrics

Subjective: human judgement (Stent et al., 2005)

Adequacy: correct meaning

Fluency: linguistic fluency

Readability: fluency in the dialogue context

Variation: multiple realizations for the same concept

Objective: automatic metrics

Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE

(9)

9

User Study

System performance from real users

1) Allow others to interact with the system

2) Record the dialogues and compute the success rate, satisfaction degree

3) Analyze where the errors come from

9

(10)

10

Concluding Remarks

Evaluate all components of the system in detail

Speech recognition: word accuracy

Language understanding: frame accuracy

Dialogue state tracking: frame accuracy

Dialogue policy: success rate

Natural language generation: BLEU

User study

Subjective: satisfaction

Objective: success rate

參考文獻

相關文件

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

* All rights reserved, Tei-Wei Kuo, National Taiwan University,

In our evaluation phase, 7 video segments are used for impact modeling; 21 video segments are collected for testing, two of which are from Korean golf programs.. Then the

In Section 3, the shift and scale argument from [2] is applied to show how each quantitative Landis theorem follows from the corresponding order-of-vanishing estimate.. A number

(e) Enquiries concerning the personal data collected by means of this form, including making of access and corrections, should be addressed to the relevant school... Sai Ying

The personal data of the students collected will be transferred to and used by the Education Bureau for the enforcement of universal basic education, school

 Apply real monitor data as trainin g input for our weight parameters and Container Management Component.  Integrate our solution to CHT sub- system to

Responsible for providing reliable data transmission Data Link Layer from one node to another. Concerned with routing data from one network node Network Layer