CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

Semantic Relationship Analysis

24 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

4.1 Three Types of Knowledge

An individual has unique knowledge and vocabulary to distinguish different objects in the world. Different people have different associations with the same object, but on the other hand, they also often share common knowledge about the same object. In a large community, users may share their own community knowledge and vocabulary with each other. In daily life, the general public share common sense knowledge to un-derstand or recognize somebody, something or somewhere. Therefore, we define three types of knowledge: personal association, community vocabulary and global knowl-edge. They describe different characteristics of human knowledge and preference.

• Personal Association: The associations are different for different people. For example, if a user enjoys traveling in Japan, he/she would always associate

``travel'' with ``japan. '' This highly co-occuring relationship between ``travel'' and ``japan'' could reveal his/her personal preference on both. In our work, we use the co-occurrence method to calculate the joint probability of any two terms to determine those highly specific, personalized associations made by this user.

• Community Knowledge: Members of a community would share common knowl-edge with each other. For example, many programmers join an online club to dis-cuss some RIA technologies like ``ajax'' and ``flash.'' Programmers share their development experience and discuss some problems with others. These knowl-edge and vocabulary are understood by a group of people. In our work, we draw from the folksonomy phenomenon of a tagging system to get the popular tags on a particular resource (URL). Based on the ``wisdom of the crowd,'' these popu-lar tags are representation of shared concepts on this resource under the shared vocabulary of this community.

4.2. PERSONAL ASSOCIATION: CO-OCCURRENCE 25

• Global Knowledge: People use natural languages to communicate with others.

These languages we use could be common sense in daily life and semantic mean-ing of words and texts from a thesaurus (or dictionary). In our work, we use both WordNet and ConceptNet to acquire knowledge from the general public. From WordNet, we can get the formal taxonomy definition of the English language, which also records the various semantic relations between words. ConceptNet, on the other hand, greatly expands the three semantic relations found in Word-Net, to twenty, such as ``effect-of'', ``capable-of'', ``made-of'', etc. It contains practical knowledge from the general public and is useful for acquiring common sense knowledge.

4.2 Personal Association: Co-occurrence

In order to understand personal preference, we analyze personal bookmark data includ-ing URLs and tags. People bookmark URLs and tags them because they are interested in these topics or find the contents valuable to read again. This bookmark data could convey personal preference and more. Not only can we discover what a person likes but also understand how he/she thinks via his/her tagging history. We could take ad-vantage of the relations between tags on his/her bookmark collection to better interpret his/her train of thoughts. To reach this goal, we propose a solution, ``co-occurrence,'' known as Jaccard coefficient. We assume that the more frequently two tags co-occur on the same documents(URLs), the more related the two tags are. LetAandB be the sets of documents described by two tags, co-occurrence is defined as Equation (4.1):

Sco−occur(A, B) = |A^"B|

|A^!B| ^(4.1)

26 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

where S_co−occur(A, B) is the co-occurrence of A and B. |A^"B| is the number of document in which tags co-occur and|A^!B|is the number of resources in which any one of the two tags occurs. In other words, we compute the proportion of tag overlap as tag similarity.

We apply the co-occurrence method on personal bookmark data. Figure 4.1 illus-trates the idea of this method. We discover that problems may arise when calculating co-occurrence because user data may be sparse. The sparsity in user data arises from the tagging mechanism. Tagging is a free-style mechanism and people usually lose some tags on their collection for reasons such as being lazy or forgetful. Therefore, it is difficult to capture the personal preference on incomplete personal data. In or-der to solve this problem, we provide two methods to reinforce the ternary relations on users, tags, and URLs. In the next section, we introduce ``Social Wisdom,'' which could reinforce the relation between URLs and tags.

Figure 4.1: Co-occurrence method on personal bookmarking data.

4.3. COMMUNITY KNOWLEDGE: SOCIAL WISDOM 27

4.3 Community Knowledge: Social Wisdom

In reality, user easily neglect some tags on URLs in various situations. For example, someone bookmarks a article about travel information in Taiwan. He assigns only one tag ``travelagent'' in a hurry and forgets to assign the relevant tag indicating the location ``Taiwan.'' (See Figure 4.2) This usually occurs in a collaborative system and some useful information may be lost. In order to enrich the number of tags on each URL, we utilize the ``wisdom of the crowd'' to add existing tags on URLs. These tags we add are from the user's past tagging history, as opposed to tags that the user never used, to better reflect his personal preference and to avoid incorrect results.

Figure 4.2: tag ``travelagent'' on real situation.

The purpose of ``social wisdom'' is to reinforce the links between tags and URLs on a user's bookmark collection. The equation of the social wisdom is defined as follows:

T ags(ui) = |T ags^popular(ui)^#T agsall(p)|

SocialW isdom(t, ui) = AddLink(t, ui),∀t ∈ T ags(uⁱ) (4.2) whereT agspopular(ui)refers to the topN popular tags for each URLuiandT agsall(p) refers to all tags in the tag collection of userp. AddLink(g, t, ui)assigns the tagton the URLui in the personal bookmark collection. The personal bookmark collection is show as a graph in Figure 4.3.

28 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

Figure 4.3: The personal tripartite graph with social wisdom

4.4 Global Knowledge: Semantic Similarity

In most cases, a tag is text with an inherent semantic meaning. People have commonly shared knowledge, known as common sense, on words used in daily life; moreover, some words have formal definitions in the dictionary, which is composed by profes-sionals. We call these human knowledge, including common sense and formal defi-nition, as global knowledge. In order to retrieve the global knowledge from tags, we establish the semantic similarity between tags by using two different kinds of databases, WordNet and ConceptNet.

4.4.1 WordNet-based similarity

WordNet is a semantic lexicon for the English language and it organizes nouns and verbs into hierarchies of is-a relations. We utilize WordNet::Similarity, which is a freely

4.4. GLOBAL KNOWLEDGE: SEMANTIC SIMILARITY 29

available software package created by Pedersen et al. [17] to measure the semantic similarity of tags.

In this package, there are six measures of similarity, and three measures of relat-edness. These measures are implemented as Perl modules which take as input two concepts and return a numeric value that represents the degree to which they are simi-lar or related. In our work, we use a simple simisimi-larity measure ``path.'' It is a baseline that is equal to the inverse of the shortest path between concepts. Thus, we construct the WordNet-based semantic similarity on personal tag set by using this package.

4.4.2 ConceptNet-based similarity

In the previous section, we introduced how we measure the semantic similarity by WordNet. In this subsection, we introduce how to use common sense reasoning to obtain semantic similarity by ConceptNet, which is a freely available common sense knowledge base that provides a natural-language-processing toolkit for reasoning tasks including ``topic-jisting'', ``analogy-making'', and ``text summarization''.

ConceptNet is a semantic network created by Hugo Liu and Push Singh[11]. It collects common sense knowledge from the Open Mind Common Sense corpus and contains 300,000 nodes and 1.6 millions links, such as (IsA `apple' `red fruit') or (Prop-ertyOf `game' `fun'). The ConceptNet toolkit provides node-level and document-level reasoning operations. Three functions on textual analysis[11] are introduced:

• GetContext(node): It accepts the input of a textual document which is then trans-lated into a ConceptNet-compatible format. It finds the neighboring relevant con-cepts using spreading activation around this concept of the document. For ex-ample: the neighborhood of the concept ``music'' includes ``play violin'', ``play piano'', ``band'', etc.

30 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

• GuessConcept(node): It takes as input a document and a novel concept in that document, and it outputs a list of potential items which are analogous to the input concept. In other words, it can obtain analogous concepts from the concept of input document. For example: the concept of ``do exercise'' is analogous to ``ride bicycle'', ``play football'', etc.

• FindPathBetweenNodes(node1,node2) Find paths in the semantic network graph between two concepts.

Context of Concepts

Given two conceptsa andb, the toolkit would determine all the concepts in the con-textual neighborhood of a and b. We assume that Ca and Cb contain the contextual neighborhood concepts ofaandbrespectively. The similarity Sc(a, b)betweenaand bbased on context is defined as follows:

Sc(a, b) = |Ca"

Given two conceptsaandb, the toolkit would determine all the analogous concepts of aandb. We assume thatAaandAb respectively contain the analogous concepts ofa andb. The similaritySa(a, b)betweenaandbbased on analogous concepts is defined as follows:

Sa(a, b) = |A^a^"Ab|

|Aa!

Ab| ^(4.4)

4.4. GLOBAL KNOWLEDGE: SEMANTIC SIMILARITY 31

where|Aa"

Cb|means the set of common concepts inAaandAb. |Aa!

Ab|^{means the} union set ofAaandAb.

Number of paths between two concepts

Given two conceptsaandb, the toolkit would determine all paths betweenaandb. We define that the path length between conceptsaandbis the number of hops in each path.

If there are more paths between two concepts, that means two concepts are more closed to each other. Thus, the similarity between them would be higher. For each path, the more hops between two concepts means they are farther away from each other; thus, the similarity would be lower. The path-based similarity is defined as follows:

Sp(a, b) = 1

whereN is the total number of paths betweenaandbin the semantic network of Con-ceptNet andhi means the number of hops in pathi.

Combination of three measures

The final semantic similarity combines the three considerations: context, analogous concepts and number of paths. We compute it as a weighted sum of these measures.

We use an equal weight on each measure and the ConceptNet-based semantic similarity is defined as follows:

CS(a, b) = WcSc(a, b) + WaSa(a, b) + WpSp(a, b) (4.6) whereWc = Wa= Wp = 1/3.

Having computed the ConceptNet-based semantic similarity between any two tags, the personal tripartite graph with semantic similarity is constructed and shown on Fig-ure 4.4. In the next section, we will propose a semantic-based co-occurrence method

32 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

to calculate the semantic relationship between tags based on two personal semantic networks.

Figure 4.4: The personal tripartite graph with semantic similarity.

4.5 Semantic-based Co-occurrence

In this section, we introduce our method to calculate the semantic relationship between tags. Firstly, we propose an idea of ``Tag Concept'' and how to get tag concept based on semantic similarity. Next, we introduce how to calculate co-occurrence based on the tag concept. This method not only considers personalized association (co-occurrence), but also global knowledge (semantic similarity).

4.5. SEMANTIC-BASED CO-OCCURRENCE 33

4.5.1 Tag Concept Based on Semantic Similarity

Spreading Activation

Concepts and ideas in the human brain have been shown to be semantically linked. Thus thinking about (or firing) one concept primes other related concepts, making them more likely to fire in the near future. In our work, we use the semantic network to model user knowledge and to find personalized associations.

We use a spreading activation algorithm [3] to conduct inferences and compute the similarity among tags. The input tag as a first node has highest level of energy and spreads a fraction of its energy to relevant tags. The value of spreading energy is directly proportional to the weight between tags. The energy of any tag after a spreading step is calculated by Equation (4.7):

Energy(tj) = ^$

i=inDegree(tj)

Energy(ti)∗ W eight(ti, tj)∗ α ^(4.7) wheret_j is the activation level of tag t_j, t_i is a tag connected to tag t_j, Energy(t_j) is energy oftj acquired fromti, andW eight(ti, tj)is a link weight betweenti andtj. inDegree(t_j)means the number of inlinks on tagt_j. IfEnergy(t_j)exceeds a threshold f, tagtj will be activated in next activation level. The energy of the tag would decrease at a ratioαstep by step, and stop until no new tags are activated. Finally, we collect the activated tags which are the related tags.

Tag Concept

Applying spreading activation on personal knowledge network, we can identify related tags given any target tag. We define these related tags of the target tag together as ``tag concept'' of this target tag. In Figure 4.5, the tag concept of ``image'' is a set containing

34 CHAPTER 4. SEMANTIC RELATIONSHIP ANALYSIS

``photo,'' ``graph,'' ``icon'' and ``image'' (target tag).

Figure 4.5: Tag concept of ``image'' based on semantic similarity

4.5.2 Semantic Co-occurrence Based on Tag Concept

We propose a ``semantic co-occurrence'' approach to calculating the semantic relation-ship between tags based on the tag concept. We calculate the number of co-occurring tag concepts on the same documents. The equation of semantic co-occurrence is de-fined by Equation 4.8.

S_semantic(a, b) = |T agConcept(a)^"T agConcept(b)|

|T agConcept(a)^!T agConcept(b)| ^(4.8) whereT agConcept(a)means the related tag set of target tagaby spreading activation andSsemantic(a, b)is the co-occurrence ofT agConcept(a)andT agConcept(b). The numerator|T agConcept(a)^"T agConcept(b)| is the number of documents in which the two tag concepts co-occur; the denominator|T agConcept(a)^!T agConcept(b)|^,

4.5. SEMANTIC-BASED CO-OCCURRENCE 35

on the other hand, is the number of resources in which any one of the two tag concepts is present. In other words, we compute the proportion of overlapping tag concepts as tag semantic similarity. Figure 4.6 shows the idea of semantic co-occurrence.

Figure 4.6: Semantic co-occurrence based on tag concept

Chapter 5 Tag-based Profile Presentation

In the previous chapter, we presented the creation of a semantic tag-based profile for the purpose of extracting user interests from personal media content. In this chapter, we propose the design of a visual tool to present the semantic tag-based profile.

5.1 Data Characteristic

The semantic tag-based profile has the following features:

• Tag weight represents the tag importance for this user. The most common way to calculate the tag weight is to use tag frequency. The more frequent a tag has been used, the greater the tag weight.

• Link weight represents the relationship importance between two tags for this user.

In our thesis, we propose the semantic-based co-occurrence and the social wis-dom to enhance the pure co-occurrence method.

• Tree views profiles represent the profile from different aspects. Using self tags we can show the most subjective opinions about this user; using tags of all users

5.2. OUR IDEA 37

we can show the most objective opinions about this user; using friends' tags (or the tags of a group of users) we can show this community's opinions about the target user. [8] Thus, we identify three viewpoints: personal, social and global to show the different aspects of a person.

5.2 Our Idea

Tag clouds represent a set of tags as weighted lists. The more often a tag has been used, the larger it will be presented. This mechanism can be used for tag profiles, through which people can quickly skim through the characteristics of a user. We use font size and color to emphasize the weights of tags. Unlike traditional tag clouds which are 1-D lists, we use a force-directed layout to present weighted tags and their links in an aesthetically pleasing way. Forces are assigned on the set of nodes (tags) and the set of edges (links). The whole graph is then simulated as a spring system which quickly comes to a stable state. The layout is shown in Figure 5.1.

In order to display the structure of the tag-based profile, we use radial layout to represent the semantic relationship between one target tag and its relevant tags. When a tag is clicked on in this graph, it becomes the target tag and is placed in the center of a series of concentric circles which are composed of the target tag's relevant tags. In this layout, two degrees of separation from the target tag are shown. (See Figure 5.2)

An iTunes-styled coverflow design is incorporated with a 3D carousel effect (See 5.3). Users are able to switch amongst different views of the tagging-based profile in an easy and quick fashion.

在文檔中賦有語意關聯的視覺化標籤式使用者描述 (頁 41-55)