• 沒有找到結果。

Service-Oriented Search: A Ranking and Retrieval Model based on Ontology of Service Relation

N/A
N/A
Protected

Academic year: 2021

Share "Service-Oriented Search: A Ranking and Retrieval Model based on Ontology of Service Relation"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)Service-Oriented Search: A Ranking and Retrieval Model based on Ontology of Service Relation Hau-Wei Chang1,Chiung-Wei Huang1,2, and Hahn-Ming Lee1,3 Department of Computer Science and Information Engineering1 National Taiwan University of Science and Technology Department of Electronic Engineering2 Ching Yun University Institute of Information Science3 Academia Sinica, Taiwan E-mail: hmlee@mail.ntust.edu.tw. ABSTRACT. irrelevant search results to users [6][7]. Second, the Lack of Domain Knowledge on relations:In service domain, there exists valuable information behind the service, i.e., service relations. When applying for a service, users might need to apply for some other related services. For example, users need to download the dedicate software before they conduct the e-taxing. In addition, due to a service often includes many procedures; users need to complete some procedures by turns if they want to accomplish the service. Thus, we propose a new search method, named Service-Oriented Search. It first constructs the service ontology by extracting the working flow of the service guided by the service experts. Next, a novel ranking approach based on measuring the distance between verb term and service term is developed for finding out the relevant search results. Then according to ontology relation, the Web pages of search results are listed in a tree-form category which helps users discovering their Web pages of interest quickly. To conclude, we employ the service relation to guide the service-based search for Internet E-services. In addition, a prototype system, named Service-Oriented Search portal, is built for verifying our ideas. Furthermore, some experiments are conducted to evaluate the performance of our proposed method.. Due to the general search engine might not work well on service-based search, a novel search approach, named Service-Oriented Search, based on applying the ontology of service is proposed in this paper. The ontology which reveals the knowledge of relationship among services is constructed by extracting the working flow of the Web Services and is employed to guide the search of Web services. At the ranking stage, we develop a simple but robust strategy to measure the distance between verb term and service term in order to find out the relevant and important Web pages that cover the E-Service information of interest. At last, the experimental results confirmed that the proposed approach not only is feasible, but also outperforms the search function of Taiwan’s E-Government portal. Keywords: Service-Oriented Search, ontology, service relation, E-Service, E-government.. 1 : INTRODUCTION The rapid growth of Web pages makes it difficult for a user to find out his or her relevant information from the Internet. Also, many governments and enterprises work hard to provide useful E-services on the Internet [1][2][3][4][5]. Thus, users can apply those services at home or at anywhere whenever they need. For example, when the user wants to lodgment or filing of taxation on the Internet (we called it “E-taxing” service here), the search results of most general search engines might return the Web pages with the term “E-taxing” but without the information about how to apply for the “E-taxing” service. Users have to confirm each link for finding their target web link. Also, there are two major issues in using current general search engines for a service-based search. First, the Limits on Making Query: Users may have difficulty in making an appropriate query to search engines because they are unfamiliar with the knowledge of service. Also, the keyword-matching method usually returns too many. 2 : SYSTEM ARCHITECTURE Service Relation Constructor. Developers. Semi-automatic tool. Web Page Collector User Interface. Users. DB. Spider. Service-Oriented Ranker Search Result Ranker. Figure 1.. 1. - 886 -. Architecture of the proposed system..

(2) Figure 1 shows the architecture of the proposed system. Three major components are included in our system: (1) Service Relation Constructor, (2) Web Page Collector, and (3) Service-Oriented Ranker. The Service Relation Constructor is guided by the developers (experts) for constructing service relations and saving the ontology in frame structures. The Web Page Collector fetches the appropriate Web pages from the internet. Then, the Service-Oriented Ranker helps to rank the retrieved pages according to the service relations for providing a better recommendation on e-service Web pages. In what follows, we introduce them in detail.. Developers Developing Agent. CKIP Agent. CKIP Parser. Service_Description DB. Term DB. Verb_Extraction Agent. Service_Thesaurus DB. Service_Extraction Agent. Service_Relation DB. 2.1 : SERVICE RELATION CONSTRUCTOR. Verb_Thesaurus DB. Figure 2. Architecture of the Service Relation Constructor.. The architecture of Service Relation Constructor is shown in Figure 2. The major agents are the Developing Agent, CKIP Agent, Verb_Extraction Agent, and Service_Extraction Agent. The Developing Agent provides an interface for service developers to paste in the service workflow or procedures in text format and save them into the Service_Description DB. And the CKIP Agent parse the service workflow text into terms by using CKIP parser [8]. Then the parsed terms will be stored into Term DB. The Verb_Extraction Agent provides an interface for service developers to choose verb terms that are related to services. The Service_Extraction Agent allows service developers to select needed service terms from the Term DB. Finally, the frame of service relation is constructed and stored in Service_Relation DB. The template of service relation ontology is shown in Figure 3. As mentioned previously, there exists relationship between services in most situation. For example, when a user wants to apply for E-taxing, he should download E-taxing software first. Therefore, we can say that there exists “download” relationship between “E-taxing” and “E-taxing software”. The frame of service relation about a service is a concrete form for representing the relationship between the service and other services. Figure 4 illusttrates the frame of service relation about “apply for E-taxing”.. Verb A. Service A. Verb an. Verb ab. Service B. Service N. Figure 3. The template of service relation ontology. 申請 (Apply). 網路繳稅 (E-taxing). 申請. 申請. 下載. (Apply). (Apply). (Download). GCA憑證. „. Developing Agent For constructing a service ontology, the Developing Agent provides an interface for Service Developers to fill in the service information, i.e., service title, service thesaurus, workflow, etc as shown in Figure 5. After developers inserted all the necessary information, the developing agent stores the service thesaurus in Service_Thesaurus DB and the description information of service into Service_Description DB.. (GCA Certification). 電子錢 包. 網路申報軟 體. (E-wallet). (E-taxing Softtware). Figure 4. A Frame of service relation about “apply for E-taxing”. from Search_Description DB and parsing it into terms by invoking the CKIP (Chinese word segmentation) parser [8]. After parsing, the CKIP Agent stores parsed terms into the Term DB.. „ CKIP Agent. „. The CKIP Agent aims at extracting service workflow. 2. - 887 -. Verb_Extraction Agent The Verb_Extraction Agent provides an interface for.

(3) Apply E-taxing. class is very important for a service because it implies what kinds of action the service needs to take. Figure 6 shows the interface of Verb_Extraction Agent.. (1) Inserting action, service title, service thesaurus and number of steps. „ Service_Extraction Agent The Service_Extraction Agent has three main components: Term Combination, Candidate Preprocessing, and Candidate Selection. Figure 7 depicts the architecture of Service_Extraction Agent. The Term Combination Agent finds out terms that are possible to be part of the service title and then combines them appropriately into a service title. Due to the service title always appears around some verbs as mentioned previously, the agent fetches verbs from Verb_Thesaurus DB and check if they agree with this kind of situation. If so, those terms that appear around the verbs are regarded as the candidates of service title. But owing to the CKIP parser is a general purpose parser, the parsed terms are not necessary to meet the requirement of dealing with Web service terms. Some service terms, e.g., “國民身份 證” (citizen ID), might be parsed into two terms, “國 民” (citizen) and “身份證” (ID). Thus, for dealing with this problem, we also develop a strategy for combining this kind of terms appropriately.. E-taxing Apply CA and E-wallet. (2) Inserting the rules and description. E-taxing software download and setup. Data storing and check. Identification check. Record printing. Figure 5. Insert the workflow of a service. Choosing the candidates as verb thesaurus. Register. Apply. Download. Cancel. Next, the Candidate Preprocessing component filters the noise of candidate terms. Two filtering strategies are listed as follows.. Check. Apply Check Store. „. Removing the single word term: We assume that a single word is meaningless because it is difficult for service developers to find out information in such a single term.. „. Removing candidates that contain symbols: Candidate terms may contain symbols after Term combination. These kinds of candidate are meaningless in Chinese.. Figure 6. Interface for developers to choose terms as verb thesaurus. Term DB. Service_Extraction Agent. Verb_Thesaurus DB. Term Combination. Candidate Preprocessing. At last, the Candidate Selection provides an interface for service developers to select candidate terms. If they consider “網路申報軟體(E-taxing software)” is related to “ 申 請 網 路 繳 稅 (apply for E-taxing)” service, he could click the checkbox in the Service field and enter the thesauruses in the Thesaurus field. After that, the Candidate Selection component stores the relation in Service_Relation DB and saves the thesaurus in Service_Thesaurus DB.. Candidate Selection Developers. Service_Relation DB. Figure 7. Architecture of Service_Extraction Agent. service developers to select verbs and classify them into predefined verb classes. The predefined verb classes are “申請類(Apply class)”, “作廢類(Cancel class)”, “檢核類(Check class)”, and “存入類(Store class)”. After investigation, we found that the verb. 2.2 : WEB PAGE COLLECTOR The main task of Spider is to crawling Web pages from the E-government portal in Taiwan [5]. For example, it crawls Web pages about E-taxing by querying the E-government portal with the service. 3. - 888 -.

(4) where m denotes the Web page, verb denotes the verb term, service denotes the service term, W1, W2, W3 denote the weights.. Developers can choose the a candidate as related service and insert its thesaurus. C _ dist. v _ num m. =. ∑ i =1. ⎡ ⎢ 1 ⎢ ⎢ s _ num ⎢ min v i − s j ⎣ j =1. (. GCA certification. T _ dist. E-wallet. m. =. relate _ len (t m , VS ) length (t m ). ×. ⎤ ⎥ ⎥ ⎥ ⎥ ⎦. (2). ). ⎡ ∗ v _ num ⎢ 1 ⎢ ∑ ⎢ s _ num ∗ k =1 * * ⎢ ⎢ min v k − s l ⎣ l =1. (. ). ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦. (3). where VS denotes V (set of v) Union S (set of s). related_len() denotes the length of VS which appears in tm. E-taxing software. T _ ser. Figure 8. Interface for developers to choose candidates.. 2.3 : Service-Oriented Ranker The main task of Service-Oriented Ranker is to rank the extracted Web pages. Also according to some observations, two assumptions are made in our ranking strategy.. „. And the ranking score is calculated as: 1. * m. *. *. m. m. +W2×T _ dist +W3×T _ ser. length (t m). (4). For evaluating the proposed approach, we conduct experiments focusing on the E-taxing service of Taiwan. The E-government portal [5] is the E-service portal of Taiwan government. It helps users to find out the Web pages related to the E-service of their interest. In addition, a prototype system, named Service-Oriented Search (SOS) portal, is constructed for verifying our proposed approach. Figure 10 shows a snapshot of our SOS portal. For comparison, we list the search results of our Service Oriented Service portal and that of E-government portal about the “申 請網路繳稅 (apply for E-taxing)” service. Table 1 shows the top 5 URLs crawled form E-government portal. The “Service Node” in Table 1 is correspondent to the “Service Node” in Figure 11. It means that the top 5 URLs in node 1 are the top 5 URLs crawled from E-government portal with the query term “網路繳稅(E-taxing)”. Table 2 shows the top 5 URLs of our SOS portal. It means that the top 5 URLs in node 1 are top 5 URLs in the service “申請網 路繳稅(apply for E-taxing)” in our system. Figure 12 shows the precision score of URLs that contain related information about E-taxing in top 5 URLs. Figure 13. For example, if the verb term is adjacent to the service term such as ” 申 請 網 路 繳 稅 (apply for E-taxing)” in the title/content of Web pages, we think that the Web pages are meaningful to users who want to apply for E-taxing. But in practice, the verb term and service term might not necessary adjacent such as “申請服務:1.網路報稅(apply service: 1.E-taxing).” Therefore, we rank the Web pages by the distance between the verb terms and service terms. The less distance between the verb terms and the service terms in the title/content of a Web page, the higher ranking score the page gets.. m. relate _ len (t m , S ). 3 : EXPERIMENTAL RESULTS. If the title or content of a Web page contains both the service term and verb terms related to service, the page obtains higher ranking score. The less distance between the verb term and the service term in the title or content of a Web page, the page gets higher ranking score.. Rank _ score ( verb, service) = W ×C _ dist. =. C_distm measures the distance between the thesaurus of verb term (v) and thesaurus of service term (s) in the content of a Web page as defined in equation (2). T_distm calculates the distance between the thesaurus of verb term (v) and thesaurus of service term (s) in the title of a Web page as defined in equation (3). T_serm is used to judge if service term occurs in the title of a Web page as defined in equation (4). An example of ranking calculation in Service-Oriented Ranker is shown in Figure 9.. thesaurus on E-taxing. The thesaurus of E-taxing is extracting from Service_Thesaurus DB. After crawling, it saves the Web pages in Metasearch_Result DB.. „. m. (1). 4. - 889 -.

(5) 1. 2. 3. Figure 10. A snapshot of the SOS portal. the information about how to apply E-taxing. In Figure 12, we can find that precision score in our system is higher than that in E-government portal. Therefore, it is reasonable that the top 5 URLs in our system contain more related information and related software about E-taxing than that in E-government portal. Also in Figure 12, precision score in node 7 of Service Oriented Search portal is zero. It is because all of top 5 URLs only contain the “二維報稅軟體” (2D E-taxing software) but not include the information about 2D E-taxing software. When a user wants to find something about 2D E-taxing software for service-based search, the best search results might be the URL that contains the software. In Figure 12 and Figure 13, the precision score in node 3 of each system is always low. It is because that the “電子錢包 (E-wallet)” service is not provided by the government. Therefore, we can not collect enough URLs about “電 子錢包(E-wallet)” in E-government portal.. verb=? ? (A pply) Thesaurus of verb={? ? , ? ? , ? ? } S ervice=? ? ? ? (E-taxing) Thesaurus of service={? ? ? ? , ? ? ? ? , ? ? ? ? ? }. t c. M. m. = 申 辦 網 路 繳 稅 ( Ap p ly E − ta xin g) = 民 眾 申 辦 網 路 繳 稅 簡 介如 下 : 申 辦 網 路 報 稅. m. (A introduction of E -taxing application : E-taxing ). (c , verb ) = {3 .5,14.5} PO S (c , service ) = {6 . 5 ,17 . 5} PO S (t , verb ) = {1 . 5} PO S (t , service ) = {4 . 5} PO S. m m. m m. C m denotes the content of w eb page,. t m denotes the title of w eb page,. PO S() denotes the set of all the term s occur in C m. or t m .. Part 1:. C _ dist. =. 1 1 2 + = min (3,14 . 5 ) min( 8 ,3 ) 3. =. 6 1 1 × = 6 min (3 ) 3. m. Part 2:. T _ dist. m. Part 3:. T _ ser. m. =. 4 6. 申請. Figure 9. An example of the ranking calculation in Service-Oriented Ranker.. (Apply). Node 1. 網路繳稅. Table 1. The number of URLs that contains related information (in Chinese) Service Node SOS portal E-government portal (1)網路繳稅 (2)GCA 憑證 (3)電子錢包 (4)網路申報軟體 (5)自然人憑證 (6)公司行號憑證 (7)二維條碼報稅軟體 (8)IRC 報稅軟體. 5 4 2 3 4 2 0 5. (E-taxing). 1 2 0 2 3 1 0 0. 申請. 申請. 下載. (Apply). (Apply). (Download). presents the precision of URLs that contain related software about E-taxing in top 5 URLs. In Figure 12, for example, the precision score of node 1 is one (5/5) because there are five URLs in top 5 URLs containing. Node 4. Node 2. Node 3. GCA憑證. 電子錢包 (E-wallet). (GCA Certification). 申請. 申請. (Apply). (Apply). 網路申報軟體 (E-taxing Softtware). 下載. 下載. (Download). (Download). Node 5. Node 6. Node 7. Node 8. 自然人憑證. 公司行號憑證 (Corporation Certification). 二維條碼報稅軟體. IRC報稅軟體. (Natural Person Certification). (2D Code E-taxing Software). (IRC E-taxing Software). Figure 11. Frame of service relation about “apply for E-taxing”.. 5. - 890 -.

(6) Table 2. Top 5 URLs for two service nodes in Service-Oriented Service portal Service Ord Web Site Name Node er 網路報稅密碼申請 (1)網路 1 (E-taxing Password) 繳稅 財政部財稅資料中心 2. relation which reveals the knowledge of services are constructed. Then, the similarity measure on Web pages is conducted by a simple but robust strategy, i.e., we calculate the distance between verb term and service term to distill service-oriented Web pages. Furthermore, according to the ontology relation, the search results are listing in a tree-form category which helps users discovering their Web pages of interest quickly. At last, the experimental results confirm that our method not only is feasible, but also outperforms the search function of Taiwan’s E-Government portal. Finally, there are some aspects that we can continue to improve our Service Oriented Search portal:. URL. http://tax.nat.gov.tw/ca5 main.htm http://web.mofdpc.gov.t (Ministry of Financial) w/page_1_3_3_5.htm 網路繳稅(E-taxing) http://www.kctax.gov.t w/02/02_04_02.htm 網路報繳稅(E-taxing) http://tax.nat.gov.tw/ 網路報繳稅(E-taxing) http://www.kctax.gov.t w/02/02_04.htm 政府憑證管理中心 http://www.pki.gov.tw/ (GCA) GCA http://gca.nat.gov.tw/ 電子公路監理網 http://www.mvdis.gov.t w/news/main_news.htm (Electronic Motor Vehicle System) 網路報繳稅(E-taxing) http://tax.nat.gov.tw/blr/ helpd.html GCA-檔案下載 http://gca.nat.gov.tw/rep (Software Download) ository.htm. 3 4 5 (2)GCA 1 憑證 2 3 4 5. „. „. 1.2. Precision. 1. „. 0.8 SOS portal. 0.6. E-government portal. 0.4 0.2 0 1. 2. 3. 4. 5. 6. 7. 8. Node. REFERENCES. Figure 12. The precision score of URLs containing related information about E-taxing.. [1]Benchmarking E-government: A Global Perspective, http://unpan1.un.org/intradoc/groups/public/documents/un/ unpan008626.pdf [2]E-Government for All, http://cmc.edc.org/library/ egov4all.html [3]Global E-Government Survey, http://www.insidepolitics.org/egovt01int.html [4]E-Government Strategy, http://www.whitehouse.gov/omb/inforeg/egovstrategy.pdf [5]E-government Portal of Taiwan, http://www.gov.tw/. [6]S. Chakrabarti, M. Berg, B. Dom, “Distributed Hypertext Resource Discovery Through Examples,” The VLDB Journal, pp. 375-386, 1999. [7]Q. Yang, H.F. Wang, J.R. Wen, G. Zhang, Y. Lu, K.F. Lee, H.J. Zang, “Toward a Next Generation Search Engine,” Proceedings of the 6th Pacific Rim Artificial Intelligence Conference, 2000. [8]CKIP, http://godel.iis.sinica.edu.tw/CKIP/. [9]B. Yuwono, D.L. Lee, “Search and Ranking Algorithms for Locating Resources on the World Wide Web,” Proceedings of the 12th International Conference on Data Engineering, pp. 164-171, 1996.. 1.2. Precision. 1 0.8 SOS portal. 0.6. E-government portal. 0.4 0.2 0 1. 2. 3. 4. 5. 6. 7. Constructing more frames of service relation. Currently, we only construct the frame of service relation on “apply for E-taxing”. Other service relations need to be built up for further investigations. Broadening the crawling scope. In this paper, our crawling strategy focuses on Web pages about government service. Therefore, we may broaden our crawling for enriching the service scope. Applying the service relation to ranking method. Experiments reveal that applying service relation in the search of service-oriented search works well. We would like to further apply the service ontology in the ranking of search results in the near future.. 8. Node. Figure 13. The precision score of URLs containing related software about E-taxing.. 4 : CONCLUSION We have proposed a novel search approach, named as Service-Oriented Search, which applies ontology of service relation for guiding the Web search. Through the help of service experts, the frames of service. 6. - 891 -.

(7)

數據

Figure 1.    Architecture of the proposed system.
Figure 4 illusttrates the frame of service relation about
Figure 5. Insert the workflow of a service.
Table 1 shows the top 5 URLs crawled form  E-government portal.    The “Service Node” in Table 1  is correspondent to the “Service Node” in Figure 11
+3

參考文獻

相關文件

To enhance availability of composite services, we propose a discovery-based service com- position framework to better integrate component services in both static and dynamic

Internal service Quality, Customer and Job Satisfaction: Linkages and Implications for Management.. Putting the Service-Profit Chain

It’s based on the PZB service quality gap model and SERVQUAL scale, and together with the reference to other relevant research and service, taking into account

SERVQUAL Scale and relevant scales to bus service quality, and based on service content and customer service related secondary data of H highway bus service company, to design the

A., and Revang, O., “A Strategic Framework for Analying Professional Service Firms — Developing Strategies for Sustained Performance”, Strategic Management Society

With a service driven market and customer service being of the utmost importance to enterprises trying to gain and maintain market share, the building and implementing of

The research result indicates that among the three constructs of website service, general service and technical service, website service and general service have shown high

The prevalence of the In-service Education is making the study of In-service student satisfaction very important.. This study aims at developing a theoretical satisfaction