2
Framework
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
3
Demo System Release (Beta)
Fill the form by testing other systems [link]
Record failed dialogues
Report to the owner team
X: cannot work; F: fail; S: success
Bonus if you test all other systems (due 6/9 Fri 23:59:59)
Important!
Guide your users by showing some possible examples to start
Improve your systems by the failed dialogues
Final system scores will be judged using part of prior failed interactions
3
4
System Improvement
Ontology: check whether all columns in the table can be searched as the target
LU: evaluate the LU to see the coverage of the understanding module
Testing data should come from real human
Provide the system link to collect more dialogues and then annotate them for evaluation
DM: add multi-turn interactions into the simulator for training the RL agent
The RL agent should handle misunderstanding better than the rule-based agent
Check whether the agent can handle misrecognized texts or misunderstanding
If the RL agent performs worse than the rule agent, increase your system complexity
More functionality/backend databases, more complex simulated interactions
Please check the strategies this agentapplied to make sure your RL agent has increasing performance trend
NLG: improve diverse and interesting responses
Multimodality: try richer multimodality for interesting interactions
Emotion recognition, speaker recognition, etc for better greeting
5
Final Score
System functionality
#tables, #slots, #intents
System success performance
Human testing performance evaluated by TAs
~30 dialogues
If the failed dialogues are fixed, we use the refined performance
Evaluation
Correctness and reasonability
Testing data should be from real human instead of generated patterns
Creativity
Multimodality usage (e.g. emotion)
Diverse/interesting responses
The poster template will be provided
5
Creativity Award
Top 3 Best System Awards
6
Milestone Score
Regrade if updating your system to support the failed interactions
Required documentations/programs can be re- submitted and regraded (max 80%)