Semantic web - Literature review - 一個擴充物件導向設計模式以存取語意網類別的方法

Chapter 2. Literature review

2.1 Semantic web

Semantic Web is invented to support a distributed Web at the level of the data rather than the presentation.

Traditionally, one webpage could point to another page. Global references, also called Uniform Resource Identifiers (URIs), can be used to having one data item point to another. The Web infrastructure with semantic web technology can provide a data model to distribute information about a single entity. Meanwhile, it publishes a distributable, machine-readable description of the data, instead of only a human-readable presentation. The Semantic Web infrastructure uses a data model called the Resource Description Framework (RDF) to represent its distributed web of data [McBride, 02][Carroll, 04].

In the early 1990s, web resources are immediately and quickly constructed after Tim Berners-Lee developed World Wide Web (hereafter called WWW), and this is also known as a first-generation WWW. Some scholars proposed that we need a machine which can understand the resources in web at the time of resources getting bigger. Hence, Tim Berners-Lee also proposed another idea "Semantic Web" in recent years, and it is also called as second-generation WWW. Tim Berners-Lee defined this semantic web as "A web may be understood by machines", and it is also a collective of information. Since the goal of semantic web is to accomplish the targets of machine understanding, and understanding the meanings considerably close to reasoning to context.

In the architecture of semantic web, a layer of metadata is constructed on the WWW in order to descript the

the portal of semantic web, such as inquiry, browsing, and service composition and among other things, are established on this metadata layer.

The resources of WWW are mainly used by human, and only human can understand the connotation of these resources, such as HTML documents, pictures or animations, and so on. These resources are not easily understood by computers according to present techniques. One task for computers can do is to visually present these specific format-based resource files to human for the purpose of interpreting the results, such as pictures and texts in HTML documents presented through browsers.

For accomplishing the purpose of semantic web, an adopted way utilizes the knowledge (including glossaries and relationships) used by different domains defined by ontology, and ontology is XML-based, and thus the web resources are easily accessed. Ontology connoted in semantic web may be applied to express web information so that two functions may be accomplished: taxonomy and reasoning. Taxonomy is a method for distinguishing different class information, and it may also be viewed as an expression of layer, while reasoning combines a relationship of both class and layer that may also discovery the implicit knowledge.

Machine readable

How does a computer read semantics? A computer should first utilize resource description framework (RDF) and Universal resource identifier (URI) linked to the related web page resources. The HTTP address used by everybody is an application of URI. Besides metadata, more and more people start using RDF to describe the knowledge contents connoted in web pages, and this is a big framework so that it is possible for one to search a specific resource over network. We dictate that everybody uses this method to describe your knowledge resource content, finds out your desired resources, utilizes ontology to define key terms through hypertext links, and makes logic reasoning.

A concept behind semantic network, widely speaking, is to use description language to describe any thing existed over network, and allows computer can "understand" what it is. For example, an object can be viewed as a part of car body or a person. If these objects can be identified, users may acquire enormous web data system links from computers. Owing to the high speed process abilities owned by computers, users may acquire enormous data, with the result of that the data obtained may be much richer than the ones derived from the results of human's unique brainstorming thinking. Therefore, scientists may apply this technique to develop new artificial intelligence (AI).

It is not just that a thinking machine should understand operations and logic rules, wherein much background knowledge should be involved. Before this, mush knowledge should be entered into machines by human and specific formats and methods are needed for accessing data. But now, at the time of fully developed web, robots may also acquire information and apply them via web, and meanwhile, this has a closer relationship to the development of semantic web.

The present internet is still a human-based, and tens of thousands of web pages, texts, pictures, images, and others are presented and recorded using readable formats. But for a machine to interpret those things, it will not always be a piece of cake for an existing AI skill to do this. So, we can’t directly ask questions to search engine;

quite the contrary, we have to fuzzily search related information in accordance with search keywords. Because machines don't understand the contents of web pages, we thus have to compute these search keywords with statistics and scoring, and then rank these computed results.

In old web pages, miscellaneous tags are still needed to describe web documents, such as font, b, br, and others, for the purpose of beautification on browser; but in fact, they are nonsense to machines with the result of interpreting barriers. The idea of web page standard promoted by W3C intentionally separates the expressions from its contents, uses semantic markups to encapsulate these contents, and also applies CSS to control their

Furthermore, enough descriptions should be added into "human format" data, and then they can be read by machines so we can say that semantic idea exists. A layer of meaningful description added into semantic web in contents thus may allow machines to understand various data structures and relationships in contents in order that machines may process these data. Semantic web uses XML, RDF, OWL, or others as a structure, and thus they give assistance to data readable by machines. Moreover, these formats may not only restrict the application of web pages, but can also be used to exchange information between machines and understand it.

However, the development of semantic development is still at the very first stage. For web page support, it was a pretty hard thing to work on the conversion from old HTML to XHTML + CSS, not to mention both RDF and OWL (more support needed), so various changes will emerge during the evolution of semantics. In recent years, a new micro formats emerges, and it uses XHTML format and also can be embedded into existing web pages so as to carry out your site readable by human and machines. Machines may access data from micro formats in web pages, and know their meanings. Just because micro formats are small and exquisite, and integrated into web pages, it has had a high profile around the world, and is called as "Lowercase Semantic Web".

In any case, semantic web in AI is an important tool. Unlike a complex structure organized in human's brain, AI may easily be structured, and it may understand data structure, its meanings, and handle it. Using a common data format agreed among computers, and they then may know each other, and all systems may collaborate together for doing more things.

Ontology

A literal interpretation of ontology is knowledge of being. Ontology is knowledge to discourse things and investigate the essential of things. Ontology in computer science means a set of specific domain knowledge, these terminologies (glossaries) have distinct definition and description, and those may not only describe a certain idea in domain knowledge, but also elucidate the relationship between concepts.

In real world, each domain has a defined ontology, or ontology-based knowledge base. The same terminologies (glossaries) in different domains, times, and usages have different meanings. You may possibly acquire large amount of data in case of network search. Computer system doesn't know the domain a glossary belongs to so that searcher has to define the real meanings of this glossary, and its corresponding domain, and the relationship between glossaries.

Developing ontology should comprise four steps: define classes in ontology, define the layer-to-layer relationship between classes, define the attributes in classes, and describe the limitation to attribute values. After you follow above steps, the correspondingly specific entity for domain ontology can thus be established.

The architecture of currently used ontology is the extension of extended XML which adopts two ontology-based languages, such as RDF (Resource Description Framework) and (Resource Description Framework Schema), enacted by W3C.

Each web page and each resource must have its own defined ontology, i.e. ontology-based knowledge base.

The same glossaries used in different fields, times or usages may represent different meanings so that incorrect network search may usually occur in this case. Network doesn't know the domain for an glossary used in each web page, so searcher has to define the real meaning for an glossary, and the domain it belongs to. In any web page, ontology may tell you about the definition of each glossary, its corresponding knowledge scope, and architecture.

If any resource in web page has a declaration, and it tells the definition and architecture about the knowledge in web page to each visiting computer, then all visiting computers may read each web page.

We have mentioned a little about ontology which describes and defines resource knowledge content and information architecture of a web page. The idea of semantic web is that RDF may be applied to ontology or documents generated through similar programming language, and may clearly define conceptual relationship and reasoning logic rules. How to describe complete knowledge? We should tell computers about what we want to express essential data meaning, and this is for computers but not human so that you have to tell computers about part concepts and all concepts needed in this web page or this resource. Moreover, how to prosecute logic reasoning between concepts in computers? We first have to give computers an ontology definition, and then the logic reasoning could be prosecuted through this ontology.

Knowledge evolution

According to Tim Berners-Lee's spoken words, the evolution of knowledge is most important. Besides the ontology heavily used in web information, he thought the most important matter is the meaning existed in the evolution of knowledge, and he also thought that if the design can properly be taken, the semantic web is helpful in evolving human's knowledge.

In each knowledge system, we may use URI to describe the relationship between concept and semantics, and then semantic web may help in doing the communication between concepts and the integration of knowledge systems. Since each knowledge system has its own architecture existed, the original conflict can thus be solved. If I tell you about my knowledge system and then you know my semantics and my reasoning obtained from this, the best communication can thus be well done after you first acquire my knowledge system.

Although the original design is to emphasize that the ontology is provided to computers, the bigger goal is that it is also to be recommended as a systematized reorganization of human's knowledge, and thus it makes the ontology readable by human, and also becomes a bridge for human's knowledge communication.

The most key issue the semantic web ideas should face to is: where can you acquire knowledge and its architecture, and how to construct it? Why each web resource (i.e., each web page) is stipulated in semantic web is to mark your own ontology in detail, and the starting point you will encounter is the variation and diversity between language glossaries and knowledge systems. The same things in different languages/dialects/domains have different names. The same nouns in different language contexts/usages/domains may have different meanings. An expression of concept can then be precisely interpreted until it knows the knowledge architecture behind the concept. This is the gap between information and knowledge that we need to stride across it.

RDF data format

RDF (Resource Description Framework) is a general-purpose description language used to describe the resource of World Wide Web and other related descriptive information. Applying simple and unified interface, you thus may use properties to describe any resource with URI (Uniform Resource Identifier) and the relationship between it and other resources. The basic element in RDF model is triple structure. Three major elements in this structure are Subject, Predict, and Object.

RDF has no way to describe what properties a resource should have, and the relationship between these properties and other resources. RDFS (RDFS Schema) is a meta-data of RDFS, and its content defines basic glossaries used by RDF to describe resources.

Basic members of RDF architecture are resources and literals, and the relationship between members may be represented by additional tags with directional line. This looks like directed graph used in math. Resources in members may be used to represent a resource applied in WWW or an object which has no actual resource, and literals are used to provide factual data. A resource and another resource or literals and connection lines may be used to describe a fact, and it is equivalent to spoken sentence in our daily lives.

Information layer established based on RDF is a general relational data model that describes the relationship between resources or literals. These relationships are derived from the definitions of ontology-based knowledge base. An ontology-based knowledge base may collect entities and concepts in an application field, and classify them into different classification systems. Furthermore, characteristic for each class are also collected, and each of them describes types with respect to these characteristic and the relationship between them and other types, or the value corresponding to each character. In semantic web, XML syntax expression is used in ontology-based knowledge base, and its standard language used is Web Ontology Language (OWL，http://www.w3.org/TR/owl-ref/). When receiving a RDF document, the meaning of triple could be understood in accordance with the ontology-based knowledge base content referred by this document.

In contrast with relational database, ontology-based knowledge base is equivalent to the schema in relational database, and the RDF example generated based on relational database is equivalent to table data. The strong program service functions established over relational database should thanks to index of schema. Since reasoning abilities provided in ontology-based knowledge base are promoted to conceptual layer, the service of content-based retrieval is totally different from the one used in WWW.

Tim Berners-Lee has two ideal dreams about network. First one, he hopes every person may share knowledge through WWW, and the second one, he hopes computers may understand human languages, and the future network is a semantic web. The WWW established through URI (Universal Resource Identifier), HTTP (Hypertext Transform Protocol), and HTML (Hypertext Markup Language) proposed by Tim Berners-Lee has led to revolutionary change.

在文檔中一個擴充物件導向設計模式以存取語意網類別的方法 (頁 13-20)