CHAPTER 2. RELATED WORK - Related Work - 賦有語意關聯的視覺化標籤式使用者描述

Related Work

6 CHAPTER 2. RELATED WORK

The social process in which users in various communities collaboratively tag pub-licly available resources and share contents is called ``collaborative tagging.'' In col-laborative tagging systems, users share their tags for particular resources and a stable, community-wide pattern in tag usage emerges over time [6]. This pattern leads to an emergent, flat set of tags without a structured, hierarchical organization. This or-ganization is called ``folksonomy,'' a user-generated classification, emerging through bottom-up consensus. It is a fusion of the words folks and taxonomy. The first use of the term folksonomy has been attributed to Thomas Vander Wal in 2004. Thomas defined folksonomy as the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. People use their own vocabulary to add explicit meanings to shared resources. The most value of a folksonomy is that it di-rectly reflects the vocabulary of users. In our work, we try to extract a user's vocabulary or knowledge from his/her own media contents based on a combination of folksonomy and semantic analysis.

2.2 Design of Tagging Systems

In discussing tagging systems, two related issues are often overlooked. The first issue involves classification of tagging systems based on their design features; the second issue involves tagging incentives of users. Some previous studies for the two issues are introduced here.

In Stefaner's master thesis [19], he organizes the design features of tagging systems based on Marlow’s classification [12] and a revised version presented in [20]. We follow Stefaner's organization of tagging systems by presenting the various dimensions of tagging systems in Table 2.1.

Incentives and motivations for users also play a significant role in affecting the tags

2.2. DESIGN OF TAGGING SYSTEMS 7

Table 2.1: Classification of tagging systems, based on [12], [20] and [19].

Dimension Values Explanation

Tagging Rights Self-tagging Users only can tag self-created resources Permission-based Users can tag some resources

Free-for-all Users can tag all available resources Source of User-generated content Users tag self-generated content

Resources Provided content Users tag content provided by the service External resources Users tag resources not hosted by service Resource Textual Type of resource being tagged is textual

Representation Non-textual Type of resource being tagged is non-textual (e.g. image or video)

Tagging Blind No awareness of community or own tags

Feedback Viewable Previously applied tags are presented Suggested The system selects tag suggestions Tag Set-model Each distinct tag is only stored once

Aggregation Bag-model Multiple applications of the same tag are counted Vocabulary Unrestricted vocabulary Free–form annotation

Control Managed vocabulary Restricted vocabulary with regular updates Fixed vocabulary Standardized classification

Resource None No specific relation between resources Connectivity Links Links between resources (e.g. web pages)

Groups Grouped resources (e.g. photo albums)

Automatic None Only user-defined tags

Tagging Auto-tags Automatically applied tags by resources analysis Automatic tag expansion Automatically applied tags by user-defined tags

8 CHAPTER 2. RELATED WORK

that emerge from collaborative tagging systems. Users are motivated both by personal needs and sociable interests. Marlow et al. categorized the motivations for tagging as organizational and social. The following list of incentives express the range of poten-tial motivations that influence tagging behavior: (1) future retrieval; (2) contribution and sharing; (3) attract attention; (4) play and competition; (5) self presentation; (6) opinion expression. In [1], they extend Marlow et al.'s work and provide a more de-tailed taxonomy of tagging motivations on Flickr, as shown in Table 2.2. There are two dimensions: sociality and function. The first dimension, ``sociality,'' describes who uses the tags and uploads the photos, including friends/family and strangers. The second dimension, ``function'' refers to a tag’s intended uses.

Table 2.2: A taxonomy of tagging motivations.[1]

Function

Organization Communication

Sociality Self *Retrieval, Directory *Context for self

*Search *Memory

Social *Contribution, Attention *Content descriptors

*Ad hoc photo pooling *Social Signaling

2.3 Common Sense Computing

Simple descriptions are often used as tags to describe people's own contents. Choosing which tag for one content depends on people's preferences and knowledge. Tags are composed of words which have inherent semantic meanings in common sense. Tags can be analyzed with the help of common sense computing technology. Common sense knowledge collects a lot of human experience and encompasses knowledge about dif-ferent aspects of typical everyday life. In this section, we introduce several popular

2.3. COMMON SENSE COMPUTING 9

knowledge bases and explain how we use this computing technique briefly. Firstly, we will introduce two large-scale and general-purpose semantic knowledge bases, Cyc and WordNet. It costs most notable efforts to build them.

2.3.1 Cyc

The Cyc project begun in 1984 by Doug Lenat. Lenat's team tried to assemble a com-prehensive ontology and database of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. They used a logic frame-work to formalize common sense knowledge. Assertions are largely handcrafted by knowledge engineers at Cycorp, and as of 2003, Cyc has over 1.6 million facts in-terrelating more than 118000 concepts (source: cyc.com). The Cyc project has been described as "one of the most controversial endeavours of the artificial intelligence history,"[2] so it has inevitably some criticisms about the complexity of system, scala-bility problems, lack of any meaningful benchmark, etc. To use Cyc to reason about the text, it is necessary to understand its own language CycL. However, this mapping pro-cess is quite complex because all of the inherent ambiguity in natural language must be resolved to produce the unambiguous logical formulation required by CycL. The difficulty of applying Cyc to practical textual reasoning tasks, and the present unavail-ability of its full content to the general public, make it a prohibitive option for most textual-understanding tasks.

2.3.2 WordNet

WordNet [15][4] is arguably the most popular and widely used semantic resource in the computational linguistics community today. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various

10 CHAPTER 2. RELATED WORK

semantic relations between these synonym sets. As of 2006, the database contains about 150,000 words organized in over 115,000 synsets for a total of 207,000 word-sense pairs.

2.3.3 Open Mind Common Sense

Open Mind Common Sense (OMCS) is an artificial intelligence project, which is cre-ated by MIT Media Lab in 2000. It aims to construct a large common sense knowledge base from the general public. The collected data is contributed by web volunteers en-tering their common sense statements into the OMCS corpus. Since then they have gathered over 700,000 sentences of common sense knowledge from over 14,000 con-tributor from around the world, many with no special training in computer science. The OMCS corpus now consists of a tremendous range of different types of common sense knowledge, expressed in natural language.

2.3.4 ConceptNet

ConceptNet [11] is a an open-source tool for using the common sense knowledge col-lected in OMCS, developed by Liu and Singh. It is a semantic network with 20 relation-types that describe different relations among things, events, characters, etc. Figure 2.1 shows a concrete example of each relation-type from actual ConceptNet data.

2.3.5 Semantic Similarity Analysis

Measures of semantic similarity between concepts are widely used in Natural Language Processing and it refers to human judgments of the degree to which a given pair of concepts are related. In Pedersen et al.'s research [17], they develop a freely available

2.3. COMMON SENSE COMPUTING 11

Figure 2.1: Semantic Relation-types in ConceptNet

12 CHAPTER 2. RELATED WORK

tool WordNet::Similarity, which provides six measures of similarity and three measures of relatedness between a pair of concepts (or word senses) based on the lexical database WordNet. A general classification of the measures and their relative advantage and disadvantage is provide in Fig 2.2.

Figure 2.2: A classification of measures of semantic similarity and relatedness and their relative advantages and disadvantages.[16]

2.4. USER PROFILING 13

2.4 User Profiling

Research in [10] harvests profiles from social networking websites, such as Friendster¹, MySpace², and Orkut³, to construct InterestMap, a network-style user profile to illus-trate the relationship between interests and identities. Unlike traditional recommender systems, the proposed approach recommends by considering the interests of people in-stead of their historical behavior in a particular application. The InterestMap produces more accurate recommendations, and the preferences and interests of people in real life are modeled in an intuitive and visual fashion.

User profile can be provided by a user or can be built from his/her own content.

In our work, we want to use tags on these content and the relationship between tags to represent a person's interest and characteristic. There is a similar idea in [14]. They construct user profiles from tagging data and they also compute the semantic relation-ship between tags using co-occurrence. A user profile is represented as tags and their relationships. They use a profile graph to represent a user, where nodes are tags used by this user and edges are the relations between tags and visualize a dynamic user profile by graph animation.

In contrast, Huang et al. [8] defined the personal, social and global views of user profiles from the tags associated with the social media content collected for the user.

In addition, statistical and common sense reasoning were utilized to establish semantic connections among these tags.

1http://www.friendster.com

2http://www.myspace.com

3http://www.orkut.com

14 CHAPTER 2. RELATED WORK

2.5 Tag-based and Social Visualization

On different social media websites, people use tags to describe their content and share with other people. Tagging is a social indexing process and contents can be categorized by any number of tags. As the number of tags increases, it becomes useful to view these tags visually. Therefore, there are more and more people getting involved in this issue about tag visualization. We present some types of tag visualization and social visualization. Firstly, we introduce the most popular visual representations: tag cloud and tag network. Next, we introduce some related work: tag orbitals and tag map.

Finally, we introduce social network visualizations.

(a) Tag cloud on del.icio.us

(b) Tag cloud on flickr

Figure 2.3: Tag cloud on social media websites

2.5. TAG-BASED AND SOCIAL VISUALIZATION 15

2.5.1 Typical Tag Visualization

Tag Cloud

Tag cloud is the most common way to present tags. People use tags to organize their bookmarked URLs on del.icio.us and share their photos with others on Flickr (Fig 2.3).

The tag cloud represents a set of tags as weighted lists. In general, people use tag frequency to determine which tags are more important than others and use font size and color to emphasize their importance. Typically, tag cloud is ordered alphabetically or by frequency. However, it is not easy to navigate when the number of tags increases day by day. In order to improve the tag cloud, some researchers try to cluster similar tags and show them together. In [7], they reduce the semantic density of a tag set and improve the visual consistency of the tag cloud layout. An approach to tag selection was proposed and a clustering algorithm was used to produce visual layout. Examples of their result are illustrated in Fig 2.4. Similar tags are placed together for easier navigation of pages by the users.

Tag Network/Graph

Tag network is usually used for presenting the relationships between tags. Through nodes and edges, people can realize the structure between tags. In Nearword⁴, it shows word synonym based on the WordNet dictionary. People can use this visual tool to understand the different meanings of one word. Examples of Nearword are illustrated in Fig 2.5(a). In another work, they try to visualize tags via complex network diagrams.

Given one specific tag, they will show related tags from del.icio.us (see Fig 2.5(b)).

Their work presents the relationship between tags, but it is hard to interpret when the tag network is huge.

4http://www.intsysr.com/nearword.htm

16 CHAPTER 2. RELATED WORK

(a) current tag cloud on del.icio.us

(b) tag cloud with similar clustering

Figure 2.4: improve tagcloud with similar clustering

(a) Nearword visualization for ``design'' (b) GRAPH DEL.ICIO.US RELATED TAGS

Figure 2.5: Tag Network/Graph examples

2.5. TAG-BASED AND SOCIAL VISUALIZATION 17

2.5.2 Tag Orbital

TagOrbitals[9] is a tag visualization work designed by Bernard Kerr. In addition to tags and their inter-relationships, he included summary information about the tagged objects in his visualization (see Fig 2.6). The idea of TagOrbitals is based on the Bohr model of the atom. Each primary tag is composed of a series of concentric circles just like

``orbitals'' (see Fig 2.7(a)). The circle size is determined by tag weight. Each orbital level indicates the number of other tags used for each bookmark item (see Fig 2.7(b)).

The first level show all tags which co-occur with the primary tag. The second level shows any set of two tags which co-occur with the primary tag, and so on.

Figure 2.6: Tag Orbital

18 CHAPTER 2. RELATED WORK

(a) related tags for design (b) url title for each tags

Figure 2.7: import tagcloud with similar clustering

Chapter 3 Tag-based Profile with Semantic

在文檔中賦有語意關聯的視覺化標籤式使用者描述 (頁 23-36)