5.5.1 Web Resource Mining
Based on the ontology, we extract all possible entity pairs that are connected with specific relations. Following the previous work, we get search snippets for entity pairs tied with specific relations by web search2 [75, 82]. Then we mine the patterns used in natural language realization of the relations. With the mined query snippets, we use dependency relations to learn natural language surface forms of each specific relation by dependency-based entity embeddings introduced below.
5.5.2 Dependency-Based Entity Embedding
As Section 2.4.2 introduces, dependency-based embeddings contain more relational informa-tion because they are trained on dependency-based contexts [108]. An example sentence
“Avatar is a 2009 American epic science fiction film directed by James Cameron.” and its dependency parsing result are illustrated in Figure 5.3. Here the sentence comes from snip-pets returned by searching the entity pair, “Avatar ” (movie) and “James Cameron” (director).
The arrows denote dependency relations from headwords to their dependents, and words on arcs denote types of dependency relations. Relations that include a preposition are “col-lapsed” prior to context extraction (dashed arcs in Figure 5.3), by directly connecting a head and the object of a preposition, and subsuming the preposition itself into the dependency
2http://www.bing.com
Table 5.1: The contexts extracted for training dependency entity embeddings in the example of the Figure 5.3.
Word Contexts
$movie film/nsub−1 is film/cop−1 a film/det−1 2009 film/num−1 american film/nn−1
epic film/nn−1 science film/nn−1 fiction film/nn−1
film avatar/nsub, is/cop, a/det, 2009/num, american/nn epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep by
$diretor directed/prep by−1
label. Before training embeddings, we replace entities with their entity tags such as $movie for “Avatar ” and $director for “James Cameron”.
The dependency-based contexts extracted from the example are given in Table 5.1, where headwords and their dependents can form the contexts by following the arc on a word in the dependency tree, and −1 denotes the directionality of the dependency. With the target words and associated dependency-based contexts, we can train dependency-based entity embeddings for all target words [169, 16, 17].
5.5.3 Surface Form Derivation
In addition to named entities detected by gazetteers, there are two different relational surface forms used in natural language, entity surface forms and entity syntactic contexts, which are derived from trained embeddings through following approaches.
5.5.3.1 Entity Surface Forms
With only background knowledge gazetteers provided in Section 5.4, the unspecified entities cannot be captured because a knowledge graph does not contain such information like words
“film” and “director ”. This procedure is to discover words that play the same role and carry similar functional dependency as the specified entities. For example, the entity $character may derive the word “role”, and $movie may derive “film”, “movie” as their entity surface forms. The unspecified entities provide important cues for inferring corresponding relations.
We first define a set of entity tags E = {ei} and a set of words W = {wj}. Based on the
trained dependency-based entity embeddings, for each entity tag ei, we compute the score of a word wj as
SiF(wj) = FormSim(wj, ei) P
ek∈EFormSim(wj, ek), (5.2) where FormSim(w, e) is the cosine similarity between the embeddings of the word w and the entity tag e. SiF(wj) can be viewed as the normalized weights of words and indicate the importance for discriminating different entities. Based on SiF(wj), we propose to extract top N similar words for each entity tag ei, to form a set of entity surface forms Fi, where Fi
includes surface form candidates of entity ei. The derived words may have similar embeddings as the target entity, for example, “director ” and $director may encode the same context information such as directed/prep by−1 in their embeddings. Therefore, the word “director ” can be extracted by the entity tag $director to serve as its surface form. With derived words Fi for entity tag ei, we can normalize relation probabilities a word wj ∈ Fi infers.
PF(ri| wj) = PF(ei | wj) = SiF(wj) P
k,wj∈FkSkF(wj), (5.3) where ri is a relation inferred from the entity tag ei, SkF(wj) is the score of a word wj that belongs to the set Fk extracted by the entity tag ek, and PF(ri | wj) is similar to PE(ri| wj) in (5.1) but based on derived words instead of specified entities.
5.5.3.2 Entity Syntactic Contexts
Another type of relational cues comes from contexts of entities; for example, a user utterance
“find movies produced by james cameron” includes an unspecified movie entity “movies” and a specified entity “james cameron”, which may be captured by entity surface forms via PF and gazetteers via PE respectively. However, it does not consider local observations “produced by”.
In this example, the most likely relation of an entity “james cameron” from the background knowledge is director.name, which infers movie.directed by, and local observations are not used to derive the correct relation movie.produced by for this utterance.
This procedure is to discover the relational entity contexts based on syntactic dependencies.
With dependency-based entity embeddings and their context embeddings, for each entity tag ei, we extract top N syntactic contexts to form a set of entity contexts Ci, which includes the words that are the most activated by a given entity tag ei. The extraction procedure is similar to one in Section 5.5.3.1; for each entity tag ei, we compute the score of the word wj
as
SiC(wj) = CxtSim(wj, ei) P
ek∈ECxtSim(wj, ek), (5.4) where CxtSim(wj, ei) is the cosine similarity between the context embeddings of a word wj
and the embeddings of a entity tag ei.
The derived contexts may serve as indicators of possible relations. For instance, for the entity tag $producer, the most activated contexts include “produced/prep by−1”, so the word
“produced ” can be extracted by this procedure for detecting local observations other than entities. Then we can normalize the relation probabilities the contexts imply to compute PC(ri | wj) similar to (5.3):
PC(ri | wj) = PC(ei| wj) = SiC(wj) P
k,wj∈CkSkC(wj). (5.5)