Chapter 2 Related Works
2.1 Semantic Web
In 1994 at the 1st International World Wide Web Conference, Tim Berners-Lee first mentioned the concept of semantic web [13-14]. In this article, Berners-Lee men-tioned the need for semantics in the Web. The web is a set of nodes and links. To a user, this has become an exciting world, but there is very little machine-readable in-formation there. With this situation, it had need to adding semantic meaning for the web. Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values. Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.
Later Berners-Lee published “Semantic Web Road Map” on the Internet [15], and it was the first time semantic web were proposed officially. The core of semantic web is that through adding metadata for documents on the Internet, these documents would not only be understood by human beings but also be reasoned and processed by computers. He proposed Resource Description Framework (RDF) as the metadata [16].RDF is a standard model for data interchange on the Web. It has features that fa-cilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be
5
changed.
In 2001 Berners-Lee published “The Semantic Web” in the magazine – Scientific American [17], which expounded the meaning and future of semantic web. It de-scribed the evolution of a Web consisted largely data and information for computers to manipulate. In this article, he introduced “ontology” into the semantic web. With ontology, computer was more capable to handle the lexical and semantic meaning in the web and completed the semantic web. It is used to reason about the properties of that domain, and may be used to describe the domain.
Ontology originates from the philosophy of traditional Greek which is a branch of “Metaphysics”. It mainly focuses on categories of being and their relations. How-ever, most ontology in modern society was implemented in computer science nowa-days. In 1993, Gruber gave a strict definition for ontology [18]. He thought “An on-tology is an explicit specification of a conceptualization.” Onon-tology in computer science is to find classes and objects in a given list, which represents concept and ent-ity of objects, and describes their property, restriction, disjoint statement, and relation.
Gruber also mentioned five rules of designing ontology:
1. Clarity: An ontology should effectively communicate the intended meaning of defined terms. Definitions should be objective.
2. Coherence: An ontology should be coherent: that is, it should sanction infe-rences that are consistent with the definitions. At the least, the defining axioms should be logically consistent.
3. Extendibility: An ontology should be designed to anticipate the uses of the shared vocabulary. It should offer a conceptual foundation for a range of an-ticipated tasks, and the representation should be crafted so that one can ex-tend and specialize the ontology monotonically.
4. Minimal encoding bias: The conceptualization should be specified at the
knowledge level without depending on a particular symbol-level encoding.
5. Minimal ontological commitment: An ontology should require the minimal ontological commitment sufficient to support the intended knowledge shar-ing activities.
In 2006, Shadbolt et al with Berners-Lee published “The Semantic Web Revi-sited” [19], reviewed the development of semantic web and introduced tolls, tech-niques, and insights about the semantic web . In this article two issues were men-tioned which are data integration and virtual uptake. Data integration is being achieved in large part through the adoption of common conceptualizations referred to as ontologies. In the past years, ontologies has been implemented in biology, medicine, genomics, and related fields. The origin ways about data exposure are HTTP, HTML, and XML. But uptake requires increasing the amount of data exposure in RDF.
In this thesis, the main recommendation algorithm is based on semantic con-tent-based recommendation, which we focused on semantic expansion and will intro-duce in the following section.
2.1.1 Semantic Expansion
Semantic expansion combines two parts: query expansion (QE), and semantic web. In 1983, both Smeaton et al [20]and Yu et al[21] used statistical relations to ex-pand query vectors, which the relations are easily generated from the document at hands. However, Peat et al found there are limitations to the effectiveness one can expect from such system [22]. In 1997, Pollitt introduced a system – HIBROWSE with query expansion by combining terms from different facets interactively to refine the query [23]. In 2003, Yee et al introduced Flamenco hierarchical browsing interface which allows users adding or removing facets while browsing a web image database and dynamically generating previews of query results [24]. Another way of query ex-pansion is thesaurus-based QE, which is employing different thesaurus relationships
7
[25].
2.1.2 Ontology and Spreading Activation Model
In 2008, Gao et al proposed an approach based on ontology and spreading acti-vation model [26]. The recommender system compares the collected data to similar data collected from others and calculates a list of recommended items for the user.
Through combining the user ontology and spreading activation model, the capability of discovering of user’s potential interests is enhanced.
Spreading activation model is proposed in 1975 by Collins et al in order to si-mulate human comprehension through semantic memory [27]. It reviewed the original spreading-activation theory developed by M. R. Quillian while trying to correct some common misunderstandings concerning it [28]. It extended the theory in several re-spects, showed how the extended theory dealt with recent experimental findings, and compared it to the model of Smith, Shoben, and Rips [29].
Spreading activation model is an organization structure of long-term memory in human brain. Crestani et al used spreading activation model in information retrieval to expand the search vocabulary and to complement the retrieved document sets [30]. It established a prototype Web search system that exploits the differences between documents usually managed by IR systems and the Web.
In 2005, Aswath et al presented an automated, high precision-based information retrieval solution to boost item findability by bridging the semantic gap between item information and popular keyword search phrases [31]. A two level spreading activa-tion network activates and hence identifies strong positive and negative phrases re-lated to the matches of a given keyword search phrase, which in turn activates other potentially relevant products in addition to those that are exact keyword matches for the search term itself. Next, a SVM classifier is trained, using these strong positive
and negative matches of a search phrase, to separate the rest of the matches from mismatches.
In 2008, Weng et al combined ontology and spreading activation model to de-velop a research paper recommendation system [32]. It proposed to use ontology and the spreading activation model for research paper recommendation that it can elevate the performance of the recommendation system and also improve the shortcomings of today's recommendation systems. This study utilized ontology to construct user pro-files and makes use of user profile ontology as the basis to reason about the interests of users. Furthermore, it took advantage of the spreading activation model to search for other influential users in the community network environment, making a study on their interests in order to provide recommendation on related information.
Cantador et al also published a thesis in 2008 which combined above methodolo-gies with context-aware for recommend news [12, 33]. They established a News@hand system combined content features and collaborative information to make news suggestions. Item and user profile are represented in terms of concepts appear-ing in domain ontologies. The semantic relations among these concepts are exploited to enrich the above representations and incorporated within the recommendation processes. Besides, they also introduce context-aware into its recommendation me-thod. Context-aware makes the system able to sense the environment of user and ex-pand the query, which enhances the recommendation result.