Learning OOV through Semantic Relatedness in Spoken Dialog Systems

(1)

Learning OOV through Semantic Relatedness in Spoken Dialog Systems

Ming Sun, Yun-Nung (Vivian) Chen, and Alexander I. Rudnicky

• Speech recognition and language understanding

performance can be improved through an OOV expect- and-learn procedure.

• A limited domain vocabulary can be utilized to

effectively acquire OOVs by the word relatedness theory through web knowledge bases.

• With data-driven semantic relatedness, both the global and local learning procedures are able to successfully harvest more than 50% of OOVs, leading to better

recognition and understanding performance.

• This work demonstrates that

o OOV learning may benefit dialog system o the proposed expect-and-learn strategy

outperforms the traditional detect-and-learn in both higher effectiveness and no human

involvement.

1. Linguistically semantic relatedness

o Defined by linguistics, e.g., WordNet (WN),

Paraphrase Database (PPDB) (

Ganitkevitch et al., 2013)

2. Data-driven semantic relatedness

o Distributional semantics, e.g., continuous bag-of- word embeddings (CBOW) (Mikolov et al., 2013)

 Detect-and-Learn (Qin et al., 2011; 2012):

o Discover OOV words during the conversation o Example:

S: “I heard something like SELF, can you repeat it?”

U: “It’s SELFIE.”

o Drawbacks

• Limited number of new words

• Required human efforts to correct spellings and pronunciations

 Expect-and-Learn (proposed):

o Use semantic relatedness to automatically enrich the vocabulary and language model beforehand

o Advantages

• Large amount of potentially useful new words can be learned

• No human involved

• Vocabulary Expansion

 Idea: learn new words related to the current domain represented by in-vocabulary words (IVs)

1. From the IV with the highest frequency v*, one unseen word w* is extracted from the resource according to:

» Local relatedness (Algo1): w* is mostly related to v*

» Global relatedness (Algo2): w* is mostly related to the complete IV set

2. Repeat until the size of vocabulary satisfies a threshold

• Language Model Expansion

o Use Kneser-Ney smoothing to estimate the unigram for the newly learned OOVs.

Recognition and Understanding Performance

Learning Strategy

Vocab Size

OOV Rate (%)

Recog.

WER (%)

SLU F1 (%) Baseline 2854 22.6 49.9 57.0

Algo1 5394 11.7 41.6 65.4

Algo2 5394 11.6 42.0 65.1

Oracle 4254 0.0

23.5 80.9

Only Domain Specific Models

Domain + Generic Models

Summary Experimental Results

• Dataset: Wall Street Journal

o Acoustic model: WSJ GMM-HMM semi continuous o Pronunciation: CMU Dictionary + Logios Lexicon Tool

• OOV Coverage Evaluation

o How much OOV tokens in test set can be covered by using different relatedness resources.

OOV Learning Method

0.00 0.10 0.20 0.30 0.40 0.50 0.60

0 500 1000 1500 2000 2500 3000

OOV Coverage

#Learned Word CBOW-Algo2 CBOW-Algo1

PPDB-Algo2 PPDB-Algo1

WN-Algo1 WN-Algo2 Detect & Learn

Random

Expect-and-Learn Procedure

Relatedness Resources

Conclusion

Test Utterance

Recognition Result

“i want to selfie”

Learned OOV OOV

Learning Domain-Specific

Collection

Domain Vocab Domain LM

ASR

Learning Strategy

Vocab Size

OOV Rate (%)

Recog.

WER (%)

SLU F1 (%) Baseline 20175 3.6 21.7 82.2

Algo1 22599 3.0 20.3 83.2

Algo2 22599 3.0 20.4 83.2

Oracle 20431 0.0

15.1 87.1

 Motivation

o Domain language may drift over time so that

ensuring language coverage in dialog systems can be a challenge (

Furnas et al., 1987)

.

o The mismatch between training data and current input increases recognition errors and

misunderstanding.

o Detect-and-Learn strategy requires human effort and takes more time to adapt the vocabulary and LM.

 Approach: Expect-and-Learn

o Automatically acquiring potential out-of-vocabulary (OOV) words by leveraging different types of words relatedness.

 Result

o Both recognition and semantic parsing accuracy can be improved after acquiring potential OOVs.