An Intelligent Assistant for High-Level Task Understanding

(1)

Ming Sun Yun-Nung Chen

Alexander I. Rudnicky School of Computer Science

Carnegie Mellon University

An Intelligent Assistant for High-Level Task

Understanding

(2)

Many devices, many apps…

2

 Rich functionality is now available

 How well can it

support user

activities?

(3)

Book a restaurant!

I can do Yelp!

(4)

4

Show me bus to China Palace.

I know Maps!

(5)

Tell kids we will have Chinese tonight.

(6)

Multi-domain Dialog – Current System

System: What can I do for you?

User: Could you arrange a dinner for me and my friends?

System: Sorry I don’t understand that. What can I do for you?

User: Can I book a table for three in Täkō downtown for this Friday?

…

System: OK. What can I do for you next?

User: Show me the bus from here.

System: Where is your destination please?

User: Täkō downtown Pittsburgh.

…

System: What should I do next?

User: Send the bus route to Carrie and Peter.

System: OK. Sending the following message to Carrie and Peter: “the bus route”.

6

Cannot handle complex intention

Passively support cross-domain dialog

No shared context

(7)

Multi-domain Dialog – Human Assistant

Assistant: What can I do for you?

User: Could you arrange a dinner for me and my friends?

Assistant : What kind of food do you prefer?

User: Mexican?

Assistant : How about Täkō? I can book a table for you.

User: Sounds good! Can I take a bus there?

Assistant : 61 A/B/C/D can take you there. Do you want to send this to your friends?

User: Great! Send it to Carrie and Peter.

Assistant: OK. The bus route 61 has been sent.

Understand complex intentions

Actively support cross-domain dialog

Maintain context

(8)

Intention Understanding

Yes Dad!

Special Team 1!

8

Plan a dinner!

(9)

Intention Understanding

Find a restaurant!

Yelp!

(10)

Intention Understanding

Find a bus route!

10

Maps!

(11)

Intention Understanding

Message Agnes & Edith!

Messenger!

(12)

Intention Understanding

Yes Dad!

Special Team 2!

12

Set up meeting!

(13)

Approach

 Step 1: Observe human user perform multi-domain tasks

 Step 2: Learn to assist at task level



Map an activity description to a set of domain apps



Interact at the task level

(14)

Data Collection 1 – Smart Phone

Wednesday 17:08 – 17:14

CMU Messenger Gmail Browser Calendar

Schedule a visit to CMU Lab

 Log app invocation + time/date/location

 Separate log into episodes if there is 3 minute inactivity

14

(15)

Data Collection 2 – Wizard-of-Oz

Find me an Indian place near CMU. Yuva India is nearby.

Monday 10:08 – 10:15

Home Yelp Maps Messenger

Schedule a lunch with David.

Music

(16)

Data Collection 2 – Wizard-of-Oz

16

When is the next bus to school? In 10 min, 61C.

Monday 10:08 – 10:15

Music

(17)

Data Collection 2 – Wizard-of-Oz

Tell David to meet me there in 15 min. Message sent.

Monday 10:08 – 10:15

Music

(18)

Corpus

 533 real-life multi-domain interactions from 14 real users

 12 native English speakers (2 non-)

 4 males & 10 females

 Mean age: 31

 Total # unique apps: 130 (Mean = 19/user)

18

Resources Examples Usage

App sequences Yelp->Maps->Messenger structure/arrangement Task descriptions “Schedule a lunch with David” nature of the intention,

language reference User utterances “Find me an Indian place near CMU.” language reference Meta data Monday, 10:08 – 10:15, Home contexts of the tasks

(19)

Intention Realization

Model

• [Yelp, Maps, Uber]

• November, weekday, afternoon, office

• “Try to arrange evening out”

• [United Airlines, AirBnB, Calendar]

• September, weekend, morning, home

• “Planning a trip to California”

• [TripAdvisor, United Airlines]

• July, weekend, morning, home

• “I was planning a trip to Oregon”

“Plan a weekend in Virginia”

• […, …, …]

• …

•“Shared picture to Alexis”

(20)

Find similar past experience

 Cluster-based:



K-means clustering on user generated language

 Neighbor-based:



KNN

1

2

3

Cluster-based Neighbor-based

20

(21)

Yelp Yelp

OpenTable Yelp

Maps Maps Maps Maps

Messenger Email

Email Email

Realize domains from past experience

 Representative Sequence

 Multi-label Classification

App sequences of similar experience (ROVER)

(22)

Some Obstacles to Remove

 Language-mismatch



Solution: Query Enrichment (QryEnr)

 [“shoot”, “photo”] -> [“shoot”, “take”, “photo”, “picture”]

 word2vec, GoogleNews model

 App-mismatch



Solution: App Similarity (AppSim)

 Functionality space (derived from app descriptions) to identify apps

 Data-driven: doc2vec on app store texts

 Rule-based: app package name

 Knowledge-driven: Google Play similar app suggestions

22

(23)

Gap between Generic and Personalized Models

QryEnr, AppSim, QryEnr+AppSim reduce the gap of F1

10 15 20 25 30 35

(24)

Compare different AppSim

0 10 20 30 40 50 60 70

Precision Recall F1

Baseline Data Knowledge Rule Combine

24

(25)

Compare different AppSim

 Combining three approaches performs the best

 Knowledge-driven and data-driven have low coverage among (manufacture) apps

 Rule-based is better than the other two individual approaches

(26)

Learning to talk at the task level



Techniques:



(Extractive/abstractive) summarization



Key phrase extraction [RAKE]



User study:



Key phrase extraction + user generated language



Ranked list of key phrases + user’s binary judgment

1. solutions online 2. project file

3. Google drive 4. math problems

5. physics homework 6. answers online

…

[descriptions]

Looking up math problems.

Now open a browser.

Go to slader.com.

Doing physics homework.

…

[utterances]

Go to my Google drive.

Look up kinematic equations.

Now open my calculator so I can plug in numbers.

…

26

(27)

1. solutions online 2. project file

3. Google drive 4. math problems

Learning to talk at the task level



Metrics

 Mean Reciprocal Rank (MRR)



Result:

 MRR ~0.6

 understandable verbal reference show up in top 2 of the ranked list

[descriptions]

Looking up math problems.

Now open a browser.

Go to slader.com.

Doing physics homework.

(28)

Summary

 Collected real-life cross-domain interactions from real users

 HELPR: a framework to learn assistance at the task level



Suggest a set of supportive domains to accomplish the task

 Personalized model > Generic model

 The gap can be reduced by QryEnr + AppSim



Generate language reference to communicate verbally at task level

28

(29)

HELPR demo

 Interface



HELPR display



GoogleASR



Android TTS

 HELPR server



User models

(30)

Thank you

 Questions?

30