• 沒有找到結果。

The goal of the ranking model is to extract domain-specific concepts from all fine-grained frames outputted by frame-semantic parsing. To induce meaningful slots for the purpose of SDS, we compute the prominence of slot candidates by additionally considering their structure information.

With the semantic parses from SEMAFOR, where each frame is viewed independently and inter-slot relations are not included, our model ranks slot candidates by integrating two information: (1) the frequency of each slot candidate in the corpus, and (2) the relations between slot candidates. Assuming that domain-specific concepts are usually related to each other, globally considering inter-slot relations induces a more coherent slot set. As the baseline in Chapter 3, we consider only the frequency of each slot candidate as its prominence without the structure information.

Since syntactic dependency relations between fillers may help measure the prominence of corresponding slots. First we construct two knowledge graphs, one is a slot-based semantic knowledge graph and another is a word-based lexical knowledge graph, both of which encode the typed dependency relations in their edge weights. We also connect two graphs to model the relations between slot-filler pairs. The integrated graph that incorporates dependency relations of semantic and lexical elements can compute the prominence of slot candidates by a random walk algorithm. The details are described as follows.

w1

w2

w3 w4

w5

w6

w7

Lexical Knowledge Graph s2

Semantic Knowledge Graph

s1 s3

Figure 4.2: A simplified example of the integration of two knowledge graphs, where a slot can-didate si is represented as a node in a semantic knowledge graph and a word wj is represented as a node in a lexical knowledge graph.

4.3.1 Knowledge Graphs

We construct two undirected graphs, semantic and lexical knowledge graphs. Each node in the semantic knowledge graph is a slot candidate si outputted by the frame-semantic parser, and each node in the lexical knowledge graph is a word wj.

• Slot-based semantic knowledge graph is built as Gs= hVs, Essi, where Vs = {si} and Ess= {eij | si, sj ∈ Vs}.

• Word-based lexical knowledge graph is built as Gw = hVw, Ewwi, where Vw = {wi} and Eww= {eij | wi, wj ∈ Vw}.

With two knowledge graphs, we build edges between slots and slot-fillers to integrate them as shown in Figure 4.2. Thus the integrated graph can be formulated as G = hVs, Vw, Ess, Eww, Ewsi, where Ews = {eij | wi ∈ Vw, sj ∈ Vs}. Ess, Eww, and Ews cor-respond to slot-to-slot relations, word-to-word relations, and word-to-slot relations respec-tively [26, 27].

4.3.2 Edge Weight Estimation

To incorporate different strengths of dependency relations in the knowledge graphs, we as-sign weights for edges. The edge weights for Eww and Ess are measured based on the dependency parsing results. The example utterance “can i have a cheap restaurant ” and its dependency parsing result are illustrated in Figure 4.3. The arrows denote the de-pendency relations from headwords to their dependents, and words on arcs denote types

capability expensiveness locale_by_use

can i have a cheap restaurant

ccomp

amod dobj

nsubj det

Figure 4.3: The dependency parsing result on an utterance.

of dependencies. All typed dependencies between two words are encoded in triples and form a word-based dependency set Tw = {hwi, t, wji}, where t is the typed dependency between the headword wi and the dependent wj. For example, Figure 4.3 generates hrestaurant, amod, cheapi, hhave, dobj, restauranti, etc. for Tw. Similarly, we build a slot-based dependency set Ts = {hsi, t, sji} by transforming dependencies between slot-fillers into ones between slots. For example, hrestaurant, amod, cheapi from Tw is transformed into hlocale by use, amod, expensivenessi for building Ts, because both sides of the non-dotted line are parsed as slot-fillers by SEMAFOR.

For all edges within a single knowledge graph, we assign the weight of the edge connecting nodes xi and xj as ˆr(xi, xj), where x is either s (slot) or w (word). Since weights are measured based on relations between nodes regardless of directions, we combine the scores for two directional dependencies:

ˆ

r(xi, xj) = r(xi → xj) + r(xj → xi), (4.1) where r(xi → xj) is the score that estimates the dependency including xi as a head and xj

as a dependent. In Section 4.3.2.1 and 4.3.2.2, we propose two scoring functions for r(·), frequency-based as r1(·) and embedding-based as r2(·) respectively.

For edges of Ews, we estimate edge weights based on the frequency that slot candidates and words are parsed as slot-filler pairs. In other words, the edge weight between a slot-filler wi

and a slot candidate sj, ˆr(wi, sj), is equal to how many times the filler wi corresponds to the slot candidate sj in the parsing results.

4.3.2.1 Frequency-Based Measurement

Based on the parsed dependency set Tx, we use txi→xj to denote the most frequent typed dependency with xi as a head and xj as a dependent.

txi→xj = arg max

t C(xi −→

t xj), (4.2)

Table 4.1: The contexts extracted for training dependency-based word/slot embeddings from the utterance of Figure 3.2.

Typed Dependency Relation Target Word Contexts Word hrestaurant, amod, cheapi restaurant cheap/amod

cheap restaurant /amod−1 Slot hlocale by use, amod, expensivenessi locale by use expensiveness/amod

expansiveness locale by use/amod−1

where C(xi −→

t xj) counts how many times the dependency hxi, t, xji occurs in the dependency set Tx. Then the scoring function that estimates the dependency xi → xj is measured as

r1(xi → xj) = C(xi −−−−→

txi→xj xj), (4.3)

which equals to the highest observed frequency of the dependency xi → xj among all types from Tx.

4.3.2.2 Embedding-Based Measurement

It is shown that a dependency-based embedding approach introduced in Section 2.4.2 is able to capture more functional similarity because it uses dependency-based syntactic contexts for training word embeddings [108]. Table 4.1 shows some extracted dependency-based contexts for each target word from the example in Figure 4.3, where headwords and their dependents can form the contexts by following the arc on a word in the dependency tree, and −1 denotes the directionality of the dependency. We learn vector representations for both words and contexts such that the dot product vw· vc is maximized when they are associated with

“good” word-context pairs belonging to the training data.

Then we can obtain the dependency-based slot and word embeddings using Ts and Tw respec-tively.

With trained dependency-based embeddings, we estimate the probability that xiis a headword and xj is its dependent via a typed dependency t as

P (xi −→

t xj) = Sim(xi, xj/t) + Sim(xj, xi/t−1)

2 , (4.4)

where Sim(xi, xj/t) is the cosine similarity between word/slot embeddings vxi and context embeddings vxj/t after normalizing to [0, 1]. Then we can measure the scoring function r2(·) as

r2(xi→ xj) = C(xi−−−−→

txi→xj xj) · P (xi−−−−→

txi→xj xj), (4.5)

which is similar to (4.3) but additionally weighted with the estimated probability. The es-timated probability smooths the observed frequency to avoid overfitting due to the smaller dataset.

4.3.3 Random Walk Algorithm

We first compute Lww= [ˆr(wi, wj)]|Vw|×|Vw| and Lss= [ˆr(si, sj)]|Vs|×|Vs|, where ˆr(wi, wj) and ˆ

r(si, sj) are either from frequency-based (r1(·)) or embedding-based measurements (r2(·)).

Similarly, Lws = [ˆr(wi, sj)]|Vw|×|Vs|and Lsw= [ˆr(wi, sj)]T|V

w|×|Vs|are computed, where ˆr(wi, sj) is the frequency that sj and wi are a slot-filler pair computed in Section 4.3.2. Then we only keep the top N highest weights for each row in Lww and Lss (N = 10), which means that we filter out edges with smaller weights within a single knowledge graph. Column-normalization are performed for Lww, Lss, Lws, Lsw[141]. They can be viewed as word-to-word, slot-to-slot, and word-to-slot relation matrices.

4.3.3.1 Single-Graph Random Walk

Here we perform a random walk algorithm only on the semantic knowledge graph to propagate scores based on inter-slot relations through the edges Ess.

R(t+1)s = (1 − α)R(0)s + αLssR(t)s , (4.6) where R(t)s denotes importance scores of slot candidates Vs in t-th iteration. In the algo-rithm, the score is the interpolation of two scores, the normalized baseline importance of slot candidates (R(0)s ), and scores propagated from the neighboring nodes in the seman-tic knowledge graph based on slot-to-slot relations Lss. The algorithm will converge when Rs = Rs(t+1)≈ R(t)s and Rs satisfies the equation,

Rs = (1 − α)R(0)s + αLssRs. (4.7) We can solve Rs as

Rs =

(1 − α)R(0)s eT + αLss



Rs= M1Rs, (4.8) where the e = [1, 1, ..., 1]T. It has been shown that the closed-form solution Rs of (4.8) is the dominant eigenvector of M1, or the eigenvector corresponding to the largest absolute eigenvalue of M1 [100]. The solution of Rs denotes the updated importance scores for all ut-terances. Similar to the PageRank algorithm, the solution can also be obtained by iteratively updating Rs(t) [19].

4.3.3.2 Double-Graph Random Walk

We borrow the idea from two-layer mutually reinforced random walk to propagate scores based on not only internal importance propagation within the same graphs but also external mutual reinforcement between different knowledge graphs [26, 27, 93].

( R(t+1)s = (1 − α)R(0)s + αLssLswR(t)w

R(t+1)w = (1 − α)Rw(0)+ αLwwLwsRs(t)

(4.9)

In the algorithm, they are the interpolations of two scores, the normalized baseline importance (R(0)s and R(0)w ) and the scores propagated from another graph. For the semantic knowledge graph, LswR(t)w is the score from the word set weighted by slot-to-word relations, and then the scores are propagated based on slot-to-slot relations Lss. Similarly, nodes in the lexical knowledge graph also include scores propagated from the semantic knowledge graph. Then R(t+1)s and R(t+1)w can be mutually updated by the latter parts in (4.9) iteratively. When the algorithm converges, Rs and Rw can be derived similarly.

( Rs= (1 − α)Rs(0)+ αLssLswRw

Rw = (1 − α)R(0)w + αLwwLwsRs (4.10) Rs = (1 − α)R(0)s + αLssLsw



(1 − α)R(0)w + αLwwLwsRs



= (1 − α)R(0)s + α(1 − α)LssLswR(0)w + α2LssLswLwwLwsRs

=



(1 − α)R(0)s eT + α(1 − α)LssLswR(0)w eT + α2LssLswLwwLws

 Rs

= M2Rs.

(4.11)

The closed-form solution Rs of (4.11) can be obtained from the dominant eigenvector of M2.