Relational Surface Form Derivation - Unsupervised Learning and Modeling of Knowledge and Intent

5.5.1 Web Resource Mining

Based on the ontology, we extract all possible entity pairs that are connected with specific relations. Following the previous work, we get search snippets for entity pairs tied with specific relations by web search² [75, 82]. Then we mine the patterns used in natural language realization of the relations. With the mined query snippets, we use dependency relations to learn natural language surface forms of each specific relation by dependency-based entity embeddings introduced below.

5.5.2 Dependency-Based Entity Embedding

As Section 2.4.2 introduces, dependency-based embeddings contain more relational informa-tion because they are trained on dependency-based contexts [108]. An example sentence

“Avatar is a 2009 American epic science fiction film directed by James Cameron.” and its dependency parsing result are illustrated in Figure 5.3. Here the sentence comes from snip-pets returned by searching the entity pair, “Avatar ” (movie) and “James Cameron” (director).

The arrows denote dependency relations from headwords to their dependents, and words on arcs denote types of dependency relations. Relations that include a preposition are “col-lapsed” prior to context extraction (dashed arcs in Figure 5.3), by directly connecting a head and the object of a preposition, and subsuming the preposition itself into the dependency

2http://www.bing.com

Table 5.1: The contexts extracted for training dependency entity embeddings in the example of the Figure 5.3.

Word Contexts

$movie film/nsub⁻¹ is film/cop⁻¹ a film/det⁻¹ 2009 film/num⁻¹ american film/nn⁻¹

epic film/nn⁻¹ science film/nn⁻¹ fiction film/nn⁻¹

film avatar/nsub, is/cop, a/det, 2009/num, american/nn epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep by

$diretor directed/prep by⁻¹

label. Before training embeddings, we replace entities with their entity tags such as $movie for “Avatar ” and $director for “James Cameron”.

The dependency-based contexts extracted from the example are given in Table 5.1, where headwords and their dependents can form the contexts by following the arc on a word in the dependency tree, and −1 denotes the directionality of the dependency. With the target words and associated dependency-based contexts, we can train dependency-based entity embeddings for all target words [169, 16, 17].

5.5.3 Surface Form Derivation

In addition to named entities detected by gazetteers, there are two different relational surface forms used in natural language, entity surface forms and entity syntactic contexts, which are derived from trained embeddings through following approaches.

5.5.3.1 Entity Surface Forms

With only background knowledge gazetteers provided in Section 5.4, the unspecified entities cannot be captured because a knowledge graph does not contain such information like words

“film” and “director ”. This procedure is to discover words that play the same role and carry similar functional dependency as the specified entities. For example, the entity $character may derive the word “role”, and $movie may derive “film”, “movie” as their entity surface forms. The unspecified entities provide important cues for inferring corresponding relations.

We first define a set of entity tags E = {ei} and a set of words W = {w_j}. Based on the

trained dependency-based entity embeddings, for each entity tag ei, we compute the score of a word w_j as

S_i^F(w_j) = FormSim(wj, ei) P

ek∈EFormSim(w_j, e_k), (5.2) where FormSim(w, e) is the cosine similarity between the embeddings of the word w and the entity tag e. S_i^F(w_j) can be viewed as the normalized weights of words and indicate the importance for discriminating different entities. Based on S_i^F(w_j), we propose to extract top N similar words for each entity tag ei, to form a set of entity surface forms Fi, where Fi

includes surface form candidates of entity e_i. The derived words may have similar embeddings as the target entity, for example, “director ” and $director may encode the same context information such as directed/prep by⁻¹ in their embeddings. Therefore, the word “director ” can be extracted by the entity tag $director to serve as its surface form. With derived words F_i for entity tag e_i, we can normalize relation probabilities a word w_j ∈ F_i infers.

P_F(r_i| w_j) = P_F(e_i | w_j) = S_i^F(w_j) P

k,wj∈F_kS_k^F(wj), (5.3) where r_i is a relation inferred from the entity tag e_i, S_k^F(w_j) is the score of a word w_j that belongs to the set F_k extracted by the entity tag e_k, and P_F(r_i | w_j) is similar to P_E(r_i| w_j) in (5.1) but based on derived words instead of specified entities.

5.5.3.2 Entity Syntactic Contexts

Another type of relational cues comes from contexts of entities; for example, a user utterance

“find movies produced by james cameron” includes an unspecified movie entity “movies” and a specified entity “james cameron”, which may be captured by entity surface forms via P_F and gazetteers via PE respectively. However, it does not consider local observations “produced by”.

In this example, the most likely relation of an entity “james cameron” from the background knowledge is director.name, which infers movie.directed by, and local observations are not used to derive the correct relation movie.produced by for this utterance.

This procedure is to discover the relational entity contexts based on syntactic dependencies.

With dependency-based entity embeddings and their context embeddings, for each entity tag e_i, we extract top N syntactic contexts to form a set of entity contexts C_i, which includes the words that are the most activated by a given entity tag ei. The extraction procedure is similar to one in Section 5.5.3.1; for each entity tag ei, we compute the score of the word wj

S_i^C(wj) = CxtSim(wj, ei) P

ek∈ECxtSim(w_j, e_k), (5.4) where CxtSim(wj, ei) is the cosine similarity between the context embeddings of a word wj

and the embeddings of a entity tag ei.

The derived contexts may serve as indicators of possible relations. For instance, for the entity tag $producer, the most activated contexts include “produced/prep by⁻¹”, so the word

“produced ” can be extracted by this procedure for detecting local observations other than entities. Then we can normalize the relation probabilities the contexts imply to compute P_C(r_i | w_j) similar to (5.3):

PC(ri | w_j) = PC(ei| w_j) = S_i^C(w_j) P

k,wj∈C_kS_k^C(wj). (5.5)

在文檔中 Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems (頁 88-91)