• 沒有找到結果。

2. The Framework

N/A
N/A
Protected

Academic year: 2022

Share "2. The Framework"

Copied!
1
0
0

加載中.... (立即查看全文)

全文

(1)

o Generate emails that reflect sender style and intent of communication o Provide emails as part of synthetic

evidence of insider threats for

purposes of training, prototyping, and evaluating anomaly detectors.

o Senders’ characteristics are modeled based on their writing patterns

(structure, politeness, etc.) instead of their attitudes

o 1st Stage: modeling sender style and topic structure for email organization o 2nd Stage: stochastic generation of

language for surface realization

 Motivation

 Approach

Two-Stage Stochastic Natural Language Generation for Email Synthesis by Modeling Sender Style and Topic Structure

Yun-Nung (Vivian) Chen and Alexander I. Rudnicky

2. The Framework

• We propose a two-stage stochastic NLG process for email synthesis that models sender style and topic structure.

• Subjects can detect sender style and can differentiate template-based (sentence-level) and stochastically-generated sentences (word-level).

• This technique can be used to create realistic emails and that email generation could be carried out using mixtures containing additional models based on other characteristics.

• The current study shows that email can be synthesized using a small corpus of labeled data; however these models could be used to bootstrap the labeling of a larger corpus which in turn could be used to create more robust models.

3. Training Data Preprocessing

The ratio of subjects’ preference according to different criteria

• Each email can be treated as a structural label sequence

 1st stage structures the emails according to sender style and topic structure (high-level generation)

 2nd stage synthesizes text content based on the particulars of an email element and the goals of a given communication (surface-level realization).

• Evaluation of Sender Style Modeling

o Rate synthesized emails for each sender on a scale of 1 (highly confident that email is not from the sender) to 5 (highly confident that email is from the sender)

o Average normalized scores the corresponding senders receive: 45% > 33% [for 3 senders]

• Evaluation of Surface Realization

o Compare template-based generation (sentence-level NLG) and stochastic generation (word- level NLG) on the same email structures.

 Different senders tend to structure emails in different ways.

 The word-based stochastic generation outperforms the template-based algorithm and requires less effort in terms of knowledge engineering.

Predicting Mixture Models

Email Document

Archive

Building Structure Structural Label LM

Annotation Structural Label Sequences

Generating Email Structures

Generated Structural Label

Sequences

<greeting>

<inform>

Slot-Value Pairs

Slot Annotation

Emails w/

Slots

Building Content

LM

Generating Text Content

Scoring Email Candidates Email Candidates

Filling Slots

Synthesized Emails

Hi Peter Today’s ...

1st Stage: Modeling Sender Style and Topic Structure for Email Organization

2nd Stage: Surface Realization

Hi [Person]

Today’s ...

Sender-Specific Model

Topic-Specific Model

Sender Topic

Input to NLG

Training Data Preprocessing

1. The Task

4. Modeling Sender Style and Topic Structure for Email Organization

From: Kitchen, Louise

Sent: Thursday, April 05, 2001 11:15 AM To: Beck, Sally

Subject: Re: Costs

Shukaly resigned and left.

But I assume the invitation will be extended to all of their groups so that whoever they want can attend.

I would actually prefer that the presentation is actually circulated to the groups on Friday rather than presented as we will wait forever on getting an offsite together.

How about circulating the presentation and then letting them refer all questions to Rahil - see how much interest you get.

One on ones are much better and I think this is how Rahil should proceed.

We need to get in front of customers in the next couple of weeks.

Let's aim to get a least three customers this quarter.

Louise suggestion

inform

request signature

header

content

• Structural Label Annotation

o 10 email structure elements (greeting, inform, request, suggestion, question, answer, regard, ack., sorry, sign)

• Slot Annotation

o General class: 7-class extracted by Named Entity Recognition (location, person, org., time, money, percent, date)

o Topic class: 3-class extracted by keywords (meeting, issue, discussion)

For each structural label:

1) Building Structure Language Models

o Sender-specific structure LM (trigram w/ smoothing) o Topic-specific structure LM (trigram w/ smoothing)

2) Predicting Mixture Models

3) Stochastically Generating Email Structures

o Generate structural label sequences randomly according to dist. of mixture models

sender-specific model

topic-specific

model mixture

model

5. Surface Realization

For each structural label:

Build Content Language Model

o Cross-sender content LMs (5-gram w/o smoothing)

For each generated structural label:

1) Stochastically Generate Text Content 2) Score Email Candidates

o We penalize the synthesized email if it:

 contains slots without provided values

 doesn’t have the required slots 3) Fill Slots

o Tomorrow’s [meeting] is at [location].

 Tomorrow’s speech seminar is at Gates building.

6. Experiments

(%) Template Stochastic No Diff

Coherence 36.19 38.57 25.24

Fluency 28.10 40.48 31.43

Naturalness 35.71 45.71 18.57

Preference 36.67 42.86 20.48

Overall 34.17 41.90 23.93

 A sender may have personal style about email structure.

 Emails about the same topic may have similar structures.

7. Conclusions

 Sender style can be noticed by subjects based on greeting usage, politeness, the length of email, etc.

參考文獻

相關文件

Experiments on a benchmark multi-domain human-human dialogue dataset show that our role-based model achieves impressive improvement in language understand- ing and dialogue

In the initial alignment finding stage, the time complexity for EMPSC algorithm in this stage is O(eloge + pn) with the scoring function based on fast the O(n)hash function, where

Klee, Recursive structure of S- matrices and an O(m 2 ) algorithm for recognizing strong sign solvability, Lin- ear Algebra Appl.. Ladner, Qualitative ma- trices:

Breu and Kirk- patrick [35] (see [4]) improved this by giving O(nm 2 )-time algorithms for the domination and the total domination problems and an O(n 2.376 )-time algorithm for

Calculate the pH of the solution after the addition of 0.100 moles of solid NaOH. Assume no volume change upon the addition

If suddenly these concentrations are increased by 0.50 M, which of the following is true?.. A) Since Kc does not change,

Step 1: With reference to the purpose and the rhetorical structure of the review genre (Stage 3), design a graphic organiser for the major sections and sub-sections of your

KS1-N2-2 Perform multiplication and division of three numbers at most, and use the commutative and associative properties of multiplication, multiplication up to 3-digit