• 沒有找到結果。

SENTENCE-GATED MODELING OPTIMIZED BY DIALOGUE ACTS Chih-Wen Goo

N/A
N/A
Protected

Academic year: 2022

Share "SENTENCE-GATED MODELING OPTIMIZED BY DIALOGUE ACTS Chih-Wen Goo"

Copied!
8
0
0

加載中.... (立即查看全文)

全文

(1)

ABSTRACTIVE DIALOGUE SUMMARIZATION WITH

SENTENCE-GATED MODELING OPTIMIZED BY DIALOGUE ACTS Chih-Wen Goo

†?

and Yun-Nung Chen

National Taiwan University, Taipei, Taiwan

?

MediaTek Inc., Hsinchu, Taiwan

r05944049@ntu.edu.tw y.v.chen@ieee.org

ABSTRACT

Neural abstractive summarization has been increasingly stud- ied, where the prior work mainly focused on summarizing single-speaker documents (news, scientific publications, etc).

In dialogues, there are diverse interactive patterns between speakers, which are usually defined as dialogue acts. The in- teractive signals may provide informative cues for better sum- marizing dialogues. This paper proposes to explicitly lever- age dialogue acts in a neural summarization model, where a sentence-gated mechanism is designed for modeling the rela- tionships between dialogue acts and the summary. The exper- iments show that our proposed model significantly improves the abstractive summarization performance compared to the state-of-the-art baselines on the AMI meeting corpus, demon- strating the usefulness of the interactive signal provided by dialogue acts.1

Index Terms— dialogues, summarization, dialogue act, sentence gate, gating mechanism.

1. INTRODUCTION

With a large amount of textual information available, text summarization has been widely studied for several years in natural language processing, which can be categorized into two types: extractive summarization and abstractive summa- rization. Extractive methods assemble the summary from the source text directly [1, 2, 3], while abstractive methods gener- ate words to form the summary [4, 5, 6]. With the rising trend of neural models, abstractive summarization has been widely investigated recently [7, 5]. In addition, some recent work proposed to combine advantages from two types of methods and achieved better summarization results [8, 9, 10, 11].

Most of the summarization work focused on single- speaker written documents such as news, scientific publi- cations, etc [12, 7, 13]. In addition to text summarization, speech summarization is equally important especially for

This work is done while the author was at National Taiwan University.

1The source code is available at http://github.com/MiuLab/

DialSum.

spoken or even multimedia documents, which are more dif- ficult to browse than text, such as multi-party meetings.

Therefore, speech summarization has been investigated in the past [14, 15, 16, 17, 18, 19]. However, almost all prior work focused on summarizing the documents based on the mentioned salient content instead of the interactive status, but this behavioral signal should be important for dialogue summarization.

To better summarize a meeting, not only the content but also the inter-speaker interactions are important. Prior dia- logue summarization work utilized prosody or speaker infor- mation as interactive patterns for better extracting salient sen- tences [18, 20]. However, abstractive summarization for dia- logue/meeting summarization has not yet explored due to the lack of suitable benchmark data [21], because the benchmark dialogue data is only annotated with the importance of utter- ances without abstractive summaries [22]. In order to bridge the gap, this paper benchmarks the abstractive dialogue sum- marization task using the AMI meeting corpus [22], where the summaries are produced based on the annotated topics the speakers discuss about. A topic or a high-level description of a meeting is treated as the abstractive summary; for instance,

“evaluation of project new idea for TV” is a summary of the meeting topics. Such dialogue summaries are very short and may not contain words directly mentioned by the speakers, making automatic summarization more challenging.

A dialogue is a sequence of utterances interacting be- tween multiple participants, where each utterance would modify both participants’ cognitive status and the current di- alogue state. The effect of an utterance on the context is often called a dialogue act [23], which provides informative cues for better understanding dialogues. Therefore, dialogue act classification has been widely studied in the spoken language understanding research field, and previous work about dia- logue act recognition used information sources from multiple modalities, including linguistic information, global contex- tual properties like knowledge about participants, and so on [24, 25, 26, 27, 28]. Popular approaches for dialogue act classification include support vector machine (SVM) [29], Naive Bayes [26, 30, 31], logistic regression [32], and recur-

(2)

Multi-Party Dialogue Dialogue Act

A: mm-hmm . Backchannel

B: mm-hmm . Backchannel

C: then , these are some of the remotes which are different in shape and colour , but they have many buttons . Inform C: so uh sometimes the user finds it very difficult to recognise which button is for what function and all that . Inform D: so you can design an interface which is very simple , and which is user-friendly . Inform

D: even a kid can use that . Inform

A: so can you got on t t uh to the next slide . Suggest

Summary: alternative interface options

Fig. 1. A dialogue instance in the dataset built from the AMI meeting corpus.

rent neural network (RNN) [33, 34, 35, 36].

Dialogue act classification and summarization are usually treated independently and used for different goals. In this paper, we leverage dialogue act information to improve di- alogue summarization. Assuming that dialogue acts, indicat- ing interactive signals, may be important for better summa- rization, how to effectively integrate the information into a neural summarization model is the main focus of this paper.

Prior work attempted at modeling the discourse information and proposed a discourse-aware summarization model using the hierarchical RNN [37], where the between-utterance cues are modeled in an implicit way. Also, they performed the model in a publication summarization task, where the input documents are relatively structured, and there is no interac- tive behavior in such documents.

Therefore, this work focuses on how to effectively model the interactive signals such as dialogue acts for better dialogue summarization, where we introduce a sentence-gated mecha- nism to jointly model the explicit relationships between dia- logue acts and summaries. To the best of our knowledge, there is no previous study with the similar idea, and we summarize our contributions as three-fold:

• The proposed model is the first attempt for dialogue summarization using dialogue acts as explicit interac- tive signals.

• We benchmark the dataset for abstractive summariza- tion in the meeting domain, where the summaries de- scribe the high-level goals of meetings.

• Our proposed model achieves the state-of-the-art per- formance in dialogue summarization and helps us ana- lyze how much each utterance and its dialogue act af- fect the summaries.

2. DIALOGUE SUMMARIZATION DATASET Considering that there is no abstractive summarization data in any conversational domain, this paper first builds a dataset in order to benchmark the experiments. The AMI meeting corpus is a well-known meeting data with different annota- tions [22], which consists of 100 hours of meeting record- ings. The recordings use a range of signals synchronized to

AMI Corpus Statistics

Vocabulary Size 8,886

#Dialogue Act 15

Min Summary Length 1

Max Summary Length 26

Training Set Size 7,024 Development Set Size 400

Testing Set Size 400

Table 1. Statistics of the AMI meeting corpus for dialogue summarization.

a common timeline, including close-talking and far-field mi- crophones, individual and room-view video cameras, and out- put from a slide projector and an electronic whiteboard. The meetings are recorded in English using three different rooms with different acoustic properties, and include mostly non- native speakers. It contains a wide range of annotations, in- cluding dialogue acts, topic descriptions, named entities, hand gesture, and gaze direction. In this work, we use the record- ing transcripts as the input to our model. Because there is no summary annotation in the AMI data, the annotated topic de- scriptions are treated as summaries of the dialogues. In AMI data, the annotations for dialogue acts and topic descriptions are not available for all utterances, so we extract a subset of the AMI corpus to construct the benchmark dialogue summa- rization dataset. Figure 1 is an example dialogue instance, where the summary describes the high-level goal of the meet- ing.

We use a sliding window size of 50 words to split a meet- ing into several dialogue samples, where we adjust the bound- ary to make sure no utterance would be cut in the middle. If the topic changes within the window, all topic descriptions are concatenated according to their appearing order. In each resulting sample, there are around 50 to 100 words in an ar- bitrary number of sentences. We extract 7,824 samples from 36 meeting recordings and then randomly separate them into three groups: 7,024 samples for training, 400 samples for de- velopment, and 400 samples for testing. There are 15 dia- logue act labels in the training set. The detailed statistics are shown in Table 1.

(3)

Dialogue History Encoder

Sentence Representation

2𝑒

3𝑒

4𝑒

𝑦

1D𝐴

𝑦

2D𝐴

𝑦

3D𝐴

𝑦

4D𝐴

Dialogue Act Attention

𝑦

1𝑆

𝑦

2𝑆

𝑦

3𝑆

𝑦

4𝑆

1𝑒

2𝑒

3𝑒

4𝑒

1𝑒

2𝑒

3𝑒

4𝑒

𝑥

1,1

𝑥

1,2

… 𝑥

2,1

𝑥

2,2

… 𝑥

3,1

𝑥

3,2

… 𝑥

4,1

𝑥

4,2

Attentional Summary Decoder

1𝑒

Summary Attention Sentence

Gate

𝑠

1

𝑠

2

𝑠

3

𝑠

4

Dialogue Act Labeler

Fig. 2. The architecture of the proposed sentence-gated models.

3. PROPOSED APPROACH

This section first explains our attention-based RNN model and then introduces the proposed sentence gating mechanism for summarization jointly optimized with dialogue act recog- nition. The model architecture is illustrated in Figure 2, where there are several modules including 1) a dialogue history en- coder, 2) a dialogue act labeler, 3) an attentional summary decoder, and 4) a sentence gate. We detail each module be- low.

3.1. Dialogue History Encoder

Given a dialogue document, there is a sequence of utter- ances s = (s1, . . . , sK) as the input, where K is the di- alogue length. An utterance is constituted by a word se- quence x = (x1, . . . , xT), and the sentence embedding can be obtained by averaging all word embeddings in that sen- tence2. The bidirectional long short-term memory (BLSTM) model [38] takes a sentence sequence s as the input, and then generates a forward hidden state−→

hei and a backward hidden state←−

hei. The final hidden state hei at the time step i is the concatenation of−→

hei and←−

hei, i.e. hei = [−→ hei,←−

hei], which can be viewed as the encoded information for the given source document.

2The experiments using RNN-learned sentence embeddings are con- ducted, but the performance is similar to using the average of word embed- dings. Considering the parameter size, all experiments use average vectors as sentence embeddings

3.2. Dialogue Act Labeler

To leverage the dialogue act information, this module fo- cuses on predicting dialogue acts for all utterances. Specif- ically, s is mapping to its corresponding dialogue act label y = (y1DA, . . . , yKDA). For each hidden state hi, we compute the dialogue act context vector cDAi as the weighted sum of LSTM’s hidden states, he1, ..., heT, by the learned attention weights αDAi,j :

cDAi =

K

X

j=1

αDAi,j · hej, (1) where the dialogue act attention weights are computed as

αDAi,j = exp(ei,j) PK

k=1exp(ei,k), (2) ei,k= σ(WheDA· hek), (3) where σ is the sigmoid activation function, and WheDA is the weight matrix of a feed-forward neural network. Then all hidden states and dialogue act context vectors are optimized for dialogue act modeling by

yDAi = softmax(WhyDA· (hei + cDAi )), (4) where yiDAis the dialogue act label of the i-th sentence in the given dialogue, and WhyDAis the weight matrix. The dialogue act attention is shown as the blue component in Figure 2.

3.3. Attentional Summary Decoder

Following the prior work [10, 37], we use an attentional de- coder for generating the word sequence as the summary. The

(4)

𝑊 𝑐

S

𝑣 tanh

𝑔

𝑐

𝑖DA

Fig. 3. Illustration of the sentence gate.

summary context vector cSi is computed as cDAsimilarly:

cSi =

K

X

j=1

αSi,j· hej. (5)

The summary is generated by a unidirectional LSTM with the initial state set to be heK, the last hidden state of the dialogue history encoder. The unidirectional LSTM will output words until generating an end-of-string token or reaching the prede- fined maximum length. The formulation is shown as:

yiS = softmax(WhyS · (hdi + cSi)). (6) 3.4. Sentence-Gated Mechanism

A gating mechanism is able to model the explicit relation- ship between two types of information [39]. The proposed sentence-gated model introduces an additional gate that lever- ages a summary context vector for modeling relationships be- tween dialogue acts and summaries in order to improve the dialogue act labeler and the attentional summary decoder il- lustrated in Figure 3. The proposed model has two different types:

• Full attention

The model considers the relations from dialogue acts and summaries using both dialogue act attention and summary attentionshown as the blue and green blocks respectively in Figure 2.

• Summary attention

The model builds the gating mechanism using only summary attention, where the parameter size is smaller than the full attention model.

3.4.1. Full Attention

First, a dialogue act context vector cDAi and an averaged sum- mary context vector cS are combined to pass through a slot

gate:

cS = 1 K

K

X

k=1

cSi, (7)

g =X

v · tanh(cDAi + W · cS), (8) where v and W are a trainable vector and a matrix respec- tively. The summation is done over elements in one time step.

g can be seen as a weighted feature of the joint context vector (cDAi and cS). We use g to weight between hi and cDAi for deriving yiDAand then replace (4) as below:

yiDA= softmax(WhyDA· (hi+ cDAi · g)). (9) A larger g indicates that the dialogue act context vector and the summary context vector pay attention to the similar part of the input sequence, which also infers that the correlation between the dialogue act and the summary is stronger and the context vector is more reliable for contributing the prediction results.

3.4.2. Summary Attention

To deeply investigate the power of the sentence gate mech- anism, we eliminate the dialogue act attention module in the architecture, so cDAi is replaced with hei. Accordingly, (8) and (9) are reformed as (10) and (11) respectively,

g =X

v · tanh(hei+ W · cS) (10) yDAi = softmax(WhyDA· (hi+ hi· g)) (11) This version allows the dialogue acts and summaries to share the attention mechanism, so both information would be mu- tually improved in a more direct manner compared to the full attention version.

3.5. Joint End-to-End Training

To learn the summarization model optimized by the dialogue act information, we formulate a joint objective as

p(yDA, yS| s) (12)

=

K

Y

k=1

p(ySt | sk)

K

Y

k=1

p(ytDA| sk)

=

K

Y

k=1

p(ySt | xk)

K

Y

k=1

p(yDAt | xk),

where p(yDA, yS | s) is the conditional probability of dia- logue acts and the summary given the input dialogue. Based on the joint objective, the proposed model that utilizes inter- active signals for summarization can be trained in an end-to- end fashion.

(5)

Model Interactive

Size DA Summarization

Signal Acc R-1 R-2 R-3 R-L

BLSTM Dialogue Act Labeler 4 3,864K 64.16 — — — —

Attentional Seq2Seq [6] 7 12,391K — 34.74 25.15 21.35 34.70

Pointer-Generator Network [10] 7 11,861K — 31.21 26.35 25.22 31.21

Discourse-Aware Hierarchical Seq2Seq [37] 4 11,295K — 66.82 37.74 27.71 47.84 Proposed Sentence-Gated (Full Attention) 3 12,363K 64.47 67.52 37.38 27.70 48.45 Sentence-Gated (Summary) 3 11,837K 64.28 68.34 39.25 29.05 49.93 Table 2. Performance on the AMI meeting data (%).indicates the significant improvement over all baselines with p < 0.05.

4. EXPERIMENTS

To evaluate the proposed model, we conduct experiments us- ing the AMI meeting data introduced in Section 2.

4.1. Setup

In all experiments, the optimizer is adam, the reported num- bers are averaged over 20 runs, and the maximum epoch is set to 30 with an early-stop strategy. In our proposed model, the size of hidden vectors are set to 256, and the vector dimen- sions vary for the compared baselines such that all models have the similar size.

For evaluation metrics, the dialogue act performance is measured by the accuracy (Acc), and the summary perfor- mance is measured by ROUGE-1 (R-1), ROUGE-2 (R-2), ROUGE-3 (R-3), and ROUGE-L (R-L) scores [40]. We also validate the performance improvement with a statistical sig- nificance test for all experiments, where single-tailed t-test is performed to measure whether the results from the proposed model are significantly better than all baselines. The dag sym- bols indicate the significant improvement with p < 0.05.

4.2. Baselines

Considering that there is no previous work for joint dialogue act modeling and summarization, the compared baselines are either for dialogue act classification or text summariza- tion, including a bidirectional LSTM for dialogue act labeler, an attentional seq2seq summarization model [6], a pointer- generator network [10], and a discourse-aware hierarchi- cal attentional seq2seq [37]. Please note that the BLSTM dialogue act labeler baseline is the same as our proposed model without the summarization component. The pointer- generator network extends the attentional seq2seq by adding a joint pointer network to enable the copy mechanism, For the discourse-aware model, we only use the concept about the hierarchy introduced by Cohan et al. [37] but do not include its pointer network part. The reason will be latter explained in Section 4.3. Among all baselines, only the discourse- aware model implicitly utilizes the interactive signal, while our model explicitly optimizes the summary together with dialogue acts.

4.3. Results

The experimental results are shown in Table 2, where the models have similar size of parameters. Among all summa- rization baselines, the discourse-aware hierarchical seq2seq model achieves better performance than other two baselines, indicating the importance of discourse/interaction cues for dialogue summarization. Comparing between attentional seq2seq and the pointer-generator network, the difference is not obvious, because the high-level descriptions as sum- maries do not overlap between the input dialogues and the corresponding summaries (1.2% of the overlapping rate for AMI meeting data). Therefore, due to the low overlapping rate, the pointer-generator network performs the worst, be- cause the pointer network and coverage loss parts introduce noises. This is the reason that other baselines and our pro- posed model do not contain the copy mechanism and cov- erage loss in the experiments. The finding suggests that the dialogue summarization focuses more on the interaction goal instead of the mentioned content.

Table 2 shows that the proposed sentence-gated mech- anism with summary attention significantly outperforms all baselines, where almost all measurements obtain the signifi- cant improvement, demonstrating that interactive signal pro- vides useful cues for dialogue summarization, and the pro- posed sentence-gated mechanism effectively models the re- lationships between them. The proposed model with full at- tention performs slightly worse than the one with summary attention only. The probable reason is that the dialogue act attention may not be necessary for predicting the dialogue acts of a single utterance; that is, dialogue acts are often de- cided only based on the individual utterance, so adding at- tention on its contextual utterances may not bring much ben- efit for modeling such interactive behaviors. Moreover, the proposed model reduces the model size by 12% compared to the best baseline combination (BLSTM for dialogue act pre- diction + discourse-aware hierarchical attention seq2seq for summarization) and demonstrates the better model capacity.

4.4. Attention Analysis

To further analyze the attention learned in the model, we vi- sualize the utterance attention weights when generating sum-

(6)

Testing Dialogue Example 1 Dialogue Act

A: okay . Assess

B: okay that’s fine , that’s good . Assess

C: okay , let’s start from the beginning Offer

C: so i’m going to speak about technical functions design Inform

C: un just like some some first issues that came up . Inform

B: um ’kay , Stall

C: so the method i was um adopting at this point , it’s not um for the for the whole um period of the um all the project but it’s just at th at this very moment .

Inform

B: um Stall

C: uh my method was um to look at um other um remote controls , Inform

C: uh so mostly just by searching on the web Inform

C: and to see what um functionality they used . Inform

C: and then um after having got this inspiration and having compared what i found on the web um just to think about what the de what the user really needs and what um what the use might desire as additional uh functionalities .

Inform Generated summary: industrial designer presentation issues of participants

Reference summary: industrial designer presentation interface specialist presentation

Testing Dialogue Example 2 Dialogue Act

A: okay , so Stall

B: hmm , okay . Backchannel

A: yeah well uh Stall

A: ipod is trendy . Inform

A: and it is well curved square . Inform

C: yeah . Backchannel

A: square . like . Inform

B: yeah , but mm is uh has round corners i think . Assess

A: so Stall

D: we shouldn’t have too square corners and that kind of thing . Inform

Generated summary: look and usability Reference summary: look and usability

Fig. 4. Visualization of summary attention vectors. The darker color indicates higher attention wights. The underlined word is the target word for illustrating the attention.

maries in Figure 4. Figures are colored with different levels of the summary attention, where the darker one has a larger attention value as its importance when generating the target word, and vice versa. It is obvious that the proposed model successfully captures which ones are the key sentences in the dialogues. It may be credited to the proposed sentence gate that learns the dialogue acts conditioned on its summary in order to provide the helpful signal for global optimization of the joint model. In addition, it can be found that the “Inform”

dialogue act usually guides the model to pay more attention to it, which aligns well with our intuition. In sum, for di- alogue summarization, the experiments show that modeling dialogue acts and summary relations controlled by the novel sentence-gated mechanism can effectively improve abstrac- tive summarization performance in terms of ROUGE scores due to the joint optimization with dialogue act modeling.

5. CONCLUSION

This paper focuses on abstractive dialogue summarization by modeling interactive behaviors, where the proposed model uses a novel sentence-gate that allows the dialogue act signal can be conditioned on the learned summarization result, in or- der to achieve better performance for both tasks. This paper benchmarks the experiments using a meeting dataset, and the experiments show that the proposed approach outperforms all state-of-the-art models, demonstrating the importance of in- teractive cues in dialogue summarization.

6. ACKNOWLEDGEMENTS

We thank the anonymous reviewers for their insightful feed- back on this work. The authors are financially supported by Ministry of Science and Technology (MOST) in Taiwan and MediaTek Inc.

(7)

7. REFERENCES

[1] Julian Kupiec, Jan Pedersen, and Francine Chen, “A trainable document summarizer,” in Proceedings of SI- GIR, 1995, pp. 68–73.

[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin- Shan Lee, “Spoken lecture summarization by random walk over a graph constructed with automatically ex- tracted key terms,” in Proceedings of INTERSPEECH, 2011, pp. 933–936.

[3] Horacio Saggion and Thierry Poibeau, “Automatic Text Summarization: Past, Present and Future,” in Multi-source, Multilingual Information Extraction and Summarization, R. Yangarber T. Poibeau; H. Saggion.

J. Piskorski, Ed., Theory and Applications of Natural Language Processing, pp. 3–13. Springer, 2012.

[4] Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, and Noah A Smith, “Toward abstractive summa- rization using semantic representations,” in Proceedings of NAACL-HLT, pp. 1077–1086.

[5] Sumit Chopra, Michael Auli, and Alexander M Rush,

“Abstractive sentence summarization with attentive re- current neural networks,” in Proceedings of NAACL- HLT, 2016, pp. 93–98.

[6] Ramesh Nallapati, Bing Xiang, and Bowen Zhou,

“Sequence-to-sequence rnns for text summarization,”

CoRR, 2016.

[7] Alexander M Rush, Sumit Chopra, and Jason Weston,

“A neural attention model for abstractive sentence sum- marization,” in Proceedings of EMNLP, 2015, pp. 379–

389.

[8] Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K.

Li, “Incorporating copying mechanism in sequence-to- sequence learning,” in Proceedings of ACL, Berlin, Ger- many, 2016, pp. 1631–1640.

[9] Yishu Miao and Phil Blunsom, “Language as a latent variable: Discrete generative models for sentence com- pression,” in Proceedings of EMNLP, 2016, pp. 319–

328.

[10] Abigail See, Peter J Liu, and Christopher D Manning,

“Get to the point: Summarization with pointer-generator networks,” in Proceedings of ACL, 2017, vol. 1, pp.

1073–1083.

[11] Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, and Min Sun, “A unified model for ex- tractive and abstractive summarization using inconsis- tency loss,” in Proceedings of ACL, 2018, pp. 1–10.

[12] Hung-yi Lee, Sz-Rung Shiang, Ching-feng Yeh, Yun- Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-shan Lee, “Spoken knowledge organization by semantic structuring and a prototype course lecture system for personalized learning,” IEEE/ACM Transactions on Au- dio, Speech and Language Processing, vol. 22, no. 5, pp. 883–898, 2014.

[13] Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush, “Bottom-up abstractive summarization,” in Pro- ceedings of EMNLP, 2018.

[14] Sameer Maskey and Julia Hirschberg, “Comparing lex- ical, acoustic/prosodic, structural and discourse features for speech summarization,” in Proceedings of EU- ROSPEECH, 2005.

[15] David Harwath and Timothy J Hazen, “Topic identifica- tion based extrinsic evaluation of summarization tech- niques applied to conversational speech,” in Proceed- ings of ICASSP, 2012, pp. 5073–5076.

[16] Korbinian Riedhammer, Benoit Favre, and Dilek Hakkani-T¨ur, “Long story short–global unsupervised models for keyphrase based meeting summarization,”

Speech Communication, vol. 52, no. 10, pp. 801–815, 2010.

[17] Yun-Nung Chen, “Automatic key term extraction and summarization from spoken course lectures,” M.S. the- sis, National Taiwan University, 6 2011.

[18] Yun-Nung Chen and Florian Metze, “Two-layer mutu- ally reinforced random walk for improved multi-party meeting summarization,” in Proceedings of SLT, 2012, pp. 461–466.

[19] Yun-Nung Chen and Florian Metze, “Multi-layer mu- tually reinforced random walk with hidden parameters for improved multi-party meeting summarization,” in Proceedings of INTERSPEECH, 2013, pp. 485–489.

[20] Yun-Nung Chen and Florian Metze, “Intra-speaker topic modeling for improved multi-party meeting summariza- tion with integrated random walk,” in Proceedings of NAACL-HLT, 2012, pp. 377–381.

[21] Mahak Gambhir and Vishal Gupta, “Recent automatic text summarization techniques: a survey,” Artificial In- telligence Review, vol. 47, no. 1, pp. 1–66, 2017.

[22] Iain McCowan, Jean Carletta, W Kraaij, S Ashby, S Bourban, M Flynn, M Guillemot, T Hain, J Kadlec, V Karaiskos, et al., “The ami meeting corpus,” in Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, 2005, vol. 88, p. 100.

(8)

[23] Harry Bunt, “Context and dialogue control,” THINK Quarterly, vol. 3, 1994.

[24] Ken Samuel, Sandra Carberry, and K. Vijay-Shanker,

“Dialogue act tagging with transformation-based learn- ing,” in Proceedings of COLING, 1998.

[25] Helen Wright, Massimo Poesio, and Stephen Isard, “Us- ing high level dialogue information for dialogue act recognition using prosodic features,” in ESCA Tuto- rial and Research Workshop (ETRW) on Dialogue and Prosody, 1999.

[26] Andreas Stolcke, Noah Coccaro, Rebecca Bates, Paul Taylor, Carol Van Ess-Dykema, Klaus Ries, Elizabeth Shriberg, Daniel Jurafsky, Rachel Martin, and Marie Meteer, “Dialogue act modeling for automatic tagging and recognition of conversational speech,” Computa- tional Linguistics, vol. 26, no. 3, pp. 339–373, Sept.

2000.

[27] Tina Kl¨uwer, Hans Uszkoreit, and Feiyu Xu, “Using syntactic and semantic based relations for dialogue act recognition,” in Proceedings of COLING, 2010, pp.

570–578.

[28] Quan Hung Tran, Ingrid Zukerman, and Gholamreza Haffari, “Preserving distributional information in di- alogue act classification,” in Proceedings of EMNLP, 2017, pp. 2151–2156.

[29] Maryam Tavafi, Yashar Mehdad, Shafiq Joty, Giuseppe Carenini, and Raymond Ng, “Dialogue act recognition in synchronous and asynchronous conversations,” in Proceedings of SIGDIAL, 2013, pp. 117–121.

[30] Simon Keizer, Rieks op den Akker, and Anton Nijholt,

“Dialogue act recognition with bayesian networks for dutch dialogues,” in Proceedings of SIGDIAL, 2002, pp. 88–94.

[31] J. Ang, Yang Liu, and E. Shriberg, “Automatic dialog act segmentation and classification in multiparty meet- ings,” in Proceedings of ICASSP, 2005, pp. 1061–1064.

[32] Yun-Nung Chen, William Yang Wang, and Alexander I Rudnicky, “An empirical investigation of sparse log- linear models for improved dialogue act classification,”

in Proceedings of ICASSP. IEEE, 2013, pp. 8317–8321.

[33] Yangfeng Ji, Gholamreza Haffari, and Jacob Eisen- stein, “A latent variable recurrent neural network for discourse-driven language models,” in Proceedings of NAACL-HLT, 2016, pp. 332–342.

[34] Hamed Khanpour, Nishitha Guntakandla, and Rod- ney D. Nielsen, “Dialogue act classification in domain- independent conversations using a deep recurrent neural network,” in Proceedings of COLING, 2016.

[35] Ji Young Lee and Franck Dernoncourt, “Sequential short-text classification with recurrent and convolutional neural networks,” in Proceedings of NAACL-HLT, 2016, pp. 515–520.

[36] Nal Kalchbrenner and Phil Blunsom, “Recurrent con- volutional neural networks for discourse composition- ality,” in Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, 2013, pp. 119–126.

[37] Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian, “A discourse-aware attention model for ab- stractive summarization of long documents,” in Pro- ceedings of NAACL-HLT, 2018, vol. 2, pp. 615–621.

[38] Gr´egoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, et al., “Using re- current neural networks for slot filling in spoken lan- guage understanding,” IEEE/ACM Transactions on Au- dio, Speech and Language Processing (TASLP), vol. 23, no. 3, pp. 530–539, 2015.

[39] Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and Yun-Nung Chen, “Slot-gated modeling for joint slot filling and in- tent prediction,” in Proceedings of NAACL-HLT, 2018, pp. 753–757.

[40] Chin-Yew Lin, “ROUGE: A package for automatic eval- uation of summaries,” Text Summarization Branches Out, 2004.

參考文獻

相關文件

Common fixed point of pair of operators in random normed spaces 1 7 Balwant Singh Siwach, R.. Singh and

Here is

graphs, a slot-based semantic knowledge graph and a word-based lexical knowledge graph, are au- tomatically constructed. To jointly consider the word-to-word, word-to-slot,

Finally, we train the SLU model by learning latent feature vectors for utterances and slot candidates through MF techniques. Combining with a knowledge graph propagation model based

The ontology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge. The MF technique for SLU modeling provides a principle model

A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers)

• The  ArrayList class is an example of a  collection class. • Starting with version 5.0, Java has added a  new kind of for loop called a for each

A factorization method for reconstructing an impenetrable obstacle in a homogeneous medium (Helmholtz equation) using the spectral data of the far-field operator was developed