Thesis Structure - Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialog

The dissertation is organized as below.

• Chapter 2 - Background and Related Work

This chapter reviews background knowledge and summarizes related works. The chap-ter also discusses current challenges of the task, describes several structured knowledge resources and presents distributional semantics that may benefit understanding prob-lems.

• Chapter 3 - Ontology Induction for Knowledge Acquisition

This chapter focuses on inducing a domain ontology that are useful for developing SLU in SDS based on the available structured knowledge resources in an unsupervised way.

Part of this research work has been presented in the following publications [31, 33]:

– Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky, “Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing,” in Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’13), Olomouc, Czech Republic, 2013.

(Student Best Paper Award)

– Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky, “Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction for Spoken Dialogue Systems,” in Proceedings of 2014 IEEE Workshop on of Spoken Language Technology (SLT’14), South Lake Tahoe, Nevada, USA, 2014.

• Chapter 4 - Structure Learning for Knowledge Acquisition

This chapter focuses on learning the structures, such as the inter-slot relations, for help-ing SLU development. Some of the contributions have been presented in the followhelp-ing publications [39, 40]:

– Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky, “Jointly Model-ing Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding,” in Proceeding of The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Hu-man Language Technologies (NAACL-HLT’15), Denver, Colorado, USA, 2015.

– Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky, “Learning Semantic Hierarchy for Unsupervised Slot Induction and Spoken Language Un-derstanding,” in Proceedings of The 16th Annual Conference of the

Interna-tional Speech Communication Association (INTERSPEECH’15), Dresden, Ger-many, 2015.

• Chapter 5 - Surface Form Derivation for Knowledge Acquisition

This chapter focuses on deriving the surface forms conveying semantics for entities from the given ontology, where the derived information contributes to better understanding.

Some of the work has been published [32]:

– Yun-Nung Chen, Dilek Hakkani-T¨ur, and Gokhan Tur, “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spo-ken Language Understanding,” in Proceedings of 2014 IEEE Workshop of SpoSpo-ken Language Technology (SLT’14), South Lake Tahoe, Nevada, USA, 2014.

• Chapter 6 - Semantic Decoding in SLU Modeling

This chapter focuses on decoding users’ spoken languages into corresponding semantic forms, which corresponds to the goal of SLU. Some of these contributions have been presented in the following publication [38]:

– Yun-Nung Chen, William Yang Wang, Anatole Gershman, and Alexander I. Rud-nicky, “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in Proceeding of The 53rd Annual Meeting of the Association for Computational Linguistics and The 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015), Beijing, China, 2015.

• Chapter 7 - Intent Prediction in SLU Modeling

This chapter focuses on modeling user intents in SLU, so that the SDS is able to pre-dict the users’ follow-up actions and further provide better interactions. Some of the contributions have been presented by following publications [28, 36, 37, 42]:

– Yun-Nung Chen and Alexander I. Rudnicky, “Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings,” in Proceedings of 2014 IEEE Workshop of Spoken Language Tech-nology (SLT’14), South Lake Tahoe, Nevada, USA, 2014.

– Yun-Nung Chen, Ming Sun, Alexander I. Rudnicky, and Anatole Gershman,

“Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in Proceedings of The 17th ACM International Confer-ence on Multimodel Interaction (ICMI’15), Seattle, Washington, USA, 2015.

– Yun-Nung Chen, Ming Sun, and Alexander I. Rudnicky, “Matrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling,” in Ex-tended Abstract of The 29th Annual Conference on Neural Information Processing

Systems – Machine Learning for Spoken Language Understanding and Interactions Workshop (NIPS-SLU’15), Montreal, Canada, 2015.

– Yun-Nung Chen, Ming Sun, Alexander I. Rudnicky, and Anatole Gershman, “Un-supervised User Intent Modeling by Feature-Enriched Matrix Factorization,” in Proceedings of The 41st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’16), Shanghai, China, 2016.

• Chapter 8 - SLU in Human-Human Conversations

This chapter investigates the feasibility of applying the technologies developed for human-machine interactions to human-human interactions, expanding the application usage to more practical and broader genres. Part of the research work has been pre-sented in the following publications [34, 35]:

– Yun-Nung Chen, Dilek Hakkani-T¨ur, and Xiaodong He, “Detecting Actionable Items in Meetings by Convolutional Deep Structured Semantic Models,” in Pro-ceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Under-standing (ASRU’15), Scottsdale, Arizona, 2015.

– Yun-Nung Chen, Dilek Hakkani-T¨ur, and Xiaodong He, “Learning Bidirectional Intent Embeddings by Convolutional Deep Structred Semantic Models for Spo-ken Language Understanding,” in Extended Abstract of The 29th Annual Confer-ence on Neural Information Processing Systems – Machine Learning for Spoken Language Understanding and Interactions Workshop (NIPS-SLU’15), Montreal, Canada, 2015.

• Chapter 9 - Conclusions and Future Work

This chapter concludes the main contributions and discusses a number of interesting directions that can be explored in the future.

2

Background and Related Work

“

Everything that needs to be said has already been said. But since no one was listening, everything must be said again.

”

Andr´e Gide, Nobel Prize in Literature winner

With an emerging trend of using mobile devices, spoken dialogue systems (SDS) are being incorporating in several devices (e.g. smartphone, smart-TV, navigating system). In the architecture of SDSs, spoken language understanding (SLU) plays an important role and there are many unsolved challenges. The next section first introduces a typical pipeline of an SDS and elaborates the functionality of each individual component. Section 2.2 details how SLU works with different examples, reviews the related literature and discusses their pros and cons; following the literature review, we briefly sketch the idea of the proposed approaches and how it is related to prior studies. Then semantic resources that are used for benefiting language understanding are introduced, where the resources with explicit semantics, Ontology and Knowledge Base, are presented in Section 2.3, and implicit semantics based on the theory, Distributional Semantics, are presented in Section 2.4.

2.1 Spoken Dialogue System (SDS)

A typical SDS is composed of a recognizer, a spoken language understanding (SLU) module, a dialogue manager (DM), and an output manager. Figure 2.1 illustrates the system pipeline.

The functionality of each component is summarized below.

• Automatic Speech Recognizer (ASR)

The ASR component takes raw audio signals and then transcribes into word hypotheses with confidence scores. The top one hypothesis would then be transmitted into the next component.

• Spoken Language Understanding (SLU)

The goal of SLU is to capture the core semantics given the input word hypothesis; And

Automatic Speech Recognizer

Domain Reasoner

Dialogue Manager Spoken

Language Understanding

Intent Detector

Output Manager / Output Generator

Slot Tagger

Knowledge Base

Natural Language Generator Multimedia

Response

Speech Synthesizer

visual, etc. textual spoken

Figure 2.1: The typical pipeline in a dialogue system.

the extracted information can be populated into task-specific arguments in a given se-mantic frame [82]. Therefore the task of an SLU module is to identify user intents and fill associated slots based on the word hypotheses. This procedure is also called semantic parsing, semantic decoding, etc. The SLU component typically includes an intent de-tector and slot taggers. An example utterance “I want to fly to Taiwan from Pittsburgh next week ” can be parsed into find flight(origin=“Pittsburgh”, destination=“Taiwan”, de-parture date=“next week”), where find flight is classified by the intent detector; and the associated slots are later filled by the slot taggers based on the detected intent. This component also estimates confidence scores of decoded semantic representations for next component usage.

• Dialogue Manager (DM) / Task Manager

Subsequent to the SLU processing, the DM interacts with users to assist them in achieving their goals. Given the above example, DM should check whether required slots are properly assigned (departure date may not properly specified) and then de-cide the system’s action such as ask date or return flight(origin=“Pittsburgh”, destina-tion=“Taiwan”). This procedure should access knowledge bases as a retrieval database to acquire the desired information. Due to possible misrecognition and misunderstand-ing errors, this procedure involves dialogue state trackmisunderstand-ing and policy selection to make more robust decisions [85, 164].

• Output Manager / Output Generator

Traditional dialogue systems are mostly used through phone calls, so the output man-ager mainly interacts with two modules, a natural language generation (NLG) module and a speech synthesizer. However, with increasing usage of various multimedia devices (e.g. smartphone, smartwatch, and smart-TV), the output manager does not need to focus on generating spoken responses. Instead, recent trend is moving toward display-ing responses via different channels; for example, the utterance “Play Lady Gaga’s Bad Romance.” should correspond to an output action that launches a music player and

then plays the specified song. Hence an additional component, multimedia response, is introduced in the infrastructure in order to handle diverse multimedia outputs.

– Multimedia Response

Given the decided action, a multimedia response considers which channel is more suitable to present the returned information based on environmental contexts, user preference, and used devices. For example, return flight(origin=“Pittsburgh”, destination=“Taiwan”) can be presented through visual responses by listing the flights that satisfy the requirement in desktops, laptops, etc., and through spoken responses by uttering “There are seven flights from Pittsburgh to Taiwan. First is ...” in the smartwatches.

– Natural Language Generation (NLG)

Given the current dialogue strategy, the NLG component generates the corre-sponding natural language responses that humans can understand for the purpose of natural dialogues. For example, an action from DM, ask date, can generate a response “Which date will you plan to fly? ”. Here the responses can be template-based or outputted by statistical models [29, 163].

– Speech Synthesizer / Text-to-Speech (TTS)

In order to communicate with users via speech, a speech synthesizer simulates human speech based on the natural language responses generated by the NLG component.

All basic components in a dialogue system should interact with each other, so errors may propagate and then result in poor performance. In addition, several components (e.g. the SLU module) need to incorporate the domain knowledge in order to handle task-specific dialogues. Because domain knowledge is usually predefined by experts or developers, when there are more and more domains, making SLU scalable has been a main challenge of SDS development.

在文檔中 Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems (頁 38-43)