Agents technologies for information retrieval by applications

SELECT Title and URL FROM Yahoo and Altavista WHERE Keyword =

4.5 Agents technologies for information retrieval by applications

About in the last fifteen years, there are so many researches [4][9][10][11][12][13][14]

[15][16][17][19][20][21][22][23][25][26][28][39][42] dedicated in Information Retrieval.

Some of those researches [5][26][27][29][39][42] utilize the agent and mobile agent technologies to retrieve the information on the Internet or over the network. Whether the stationary agents, mobile agents, or multi-agents, the user may delegate the agents to autonomously retrieve the information over the network. The only differences is that the mobile agent can migrate itself across many hosts. In many cases, the mobile agent's performance is better than stationary agents. Moreover, the mobile agents are not bound to the stable network connection.

Utilizing stationary agent to retrieve information over the network, the agent should access the remote information through the stable network connection. Meanwhile, the system environment for the stationary agent to execute should have enough computing power.

For most mobile devices, especially for the mobile phones, it is an impossible mission for executing the mobile agents on the devices. The solution for mobile devices to execute the mobile agents is to send these mobile agents to remote servers. The role of mobile devices, especially the devices with limited computing power, is as the interface of the mobile agents.

That is, the user may design, manage, and operate the mobile agents by the mobile devices.

The mobile agents will really run at the remote sites instead of at the mobile device. Because most of the works can be done at the server, the traditional approaches for information retrieval on the Internet all can be applied in our framework.

Utilizing the agent as the information retriever is important. However, most of search engines and multi-search engines are developed only for WWW users, not for application programs that need to exploit data from the web. They also have no a uniform interface while accommodating new and powerful search engines in future, so that most multi-search engines are less the extensibility. We have proposed a uniform interface - Internet Search Service (ISS) [19] that follows the COSS of OMG’s CORBA [24] to solve the problem

described above. And, an experimental ISS-based multi-search engine termed Octopus has been built. With that Octopus can accommodate new search engines easily and support application programs to exploit data from the Internet. Most returned results from search engines could be not useful for users even if these results are ranked higher index. Search engines with the function of personalized search are strongly necessary for experts. Some well-known search engines have supported this function, such as SavvySearch, MyYahoo etc.

The main reason of personalized search is to offer most suitable query results to user.

In this section, we will outline the design and implementation for supporting personalized search in the Octopus. In Octopus, an absolutely irrelevant filtering approach used to support the personalized search. Moreover, in order to balance system load and user requirement, the filtering mechanism is divided into three levels – URL, Description/Context, and Content respectively. In addition, the feedback mechanism is in cooperation with the filtering mechanism to achieve the functionality of personalized search in a search engine.

This kind of search service is favorable for WWW user, not for application programs.

Therefore, this function is independent of the ISS. To support such service is only to redesign the architecture of Octopus. The original advantages of ISS should be reserved in providing other functionality. All the interfaces will be not modified. The major contribution of the research is providing an approach to support the personalized search service based on the ISS without change its interface of the Octopus.

The ISS is designed by following the style of CORBA’s COSS. Its major goal is providing a uniform interface for most search engines. The details of ISS Octopus’s scenario please refer to [19]. In the section, we describe the personalized support on the Octopus.

In keeping the advantage of the ISS, a personal information-filtering agent is added into the Octopus instead of modifying architecture or interface of ISS when adding personal functionality. Figure 4.9 shows the preliminary design of the Octopus with personalized search. The major difference between this architecture with original Octopus version is adding a personal information-filtering agent that used to filter users favor. Such design philosophy is in order to reduce the search overhead when similar query is requested repeatedly.

The feedback mechanism is in cooperation with the filtering mechanism to achieve the functionality of personalized search. The latter used to find out adequate results, while the former let the user to respond what his/her favor is. In this approach, two mechanisms are

adapted implicit feedback approach and absolutely irrelevant filtering approach respectively.

Using implicit feedback approach instead of explicit one is in order to go with the filtering mechanism properly. The absolutely irrelevant filtering approach is based on the custom of user in searching information from large amount of URLs and descriptions. In generally, users will first visit those deemed more suitable of URLs and skip the others that symbolize irrelevant. The visited web page may represent the page is interested by user in some extent. To analyze those fully irrelevant URLs or descriptions may find out more relevant to what don’t he/she want than relevant approach and act as the filtering basis. This is the spirit of absolutely irrelevant filtering approach.

Figure 4.9: The preliminary architecture of supporting personalized search.

We can utilize classical Boolean model of Information Retrieval [1] to explain the concept. If

K r

is the set of relevant terms and

K

r_j

is the set of non-relevant terms. Then

K

= r ∪ r will be all of terms that are included in returned results or documents. And

K

− will be more relevant terms. So that we use absolutely irrelevant filtering approach to filter the non-relevant terms will get more relevant terms.

In the design, the personal information-filtering agent analyzes the factor of these non-relevant URLs or web pages and stores it into knowledge base for future searching requests. The knowledge base keeps personal filtering information. When the Search Engine Agent replies the user request, then the Information-Filtering Agent will filter the result

in accordance with the personal filtering information.

The filtering mechanism is the corpus in supporting personalized search. In order to balance system load and user requirement, the filtering mechanism is divided into three levels – URL, Description/Context, and Content respectively.

1. URL: This is a simplest filtering level. This level filters the URLs of result that are selected and non-visited by user, into personal database. Those non-visited URLs imply absolutely irrelevant and act as filtering base for future searching request.

Because this level is simplest, it has slightest overhead.

2. Description/Context: Almost all returned results of search engines consist of URL and description. In the level, filtering mechanism analyzes vocabulary in description that excludes non-stop term1 of all the non-visited web sites and applies the index model [1] to create an index for each term. When the index of certain term exceeds the pre-determined cutoff threshold, the filtering mechanism will keep it into the list of filtering terms and stores the analyzed results into filtering base. When a user issues a search request with this filtering level, the Information Filtering Agent will utilize the filtering base to filter the query result and to discard those irrelevant results.

3. Contents: The same technology as second level is applied to this level with the exception of analyzed target is the full content of web site. Because the size of analysis is the largest in three levels, the overhead is also largest. All of filtering level is based on the implicit feedback mechanism that feed back the selected web sites implicitly. The detailed description is in next subsection.

Figure 4.10 shows the detail architecture in supporting the functionality of personalized search in Octopus. Based on ISS, some components are added into the system to support this function, such as User Profile, Filtering Database, Feedback mechanism, and Result

processing mechanism etc. Follows describe the system scenario.

The system first checks user identifier through User Profile. Once the user passes the check, the system will generate a query page for user to post the query string and wait for query request. Then, Result Filter will look for the query result from Result Cache. If missing the expected information, then Query Page Generator will submit this request to ISS’s mediator for searching new information and get the result through the Result Aggregator.

The Result Filter thereat filters the results based on the information of personal profile and

filtering database and passes the filtered results to user. Once the user receives the results,

Feedback Mechanism is activated to monitor the user’s feedback. When a user wants to visit

the web page through Result Display page, the visiting process will physically link to server’s CGI program that can log the visited history and redirect the visited URL to the web site.

The recorded information just represents that the URLs are related. So that the situation to filter those irrelevant information are the user complete a review session after visit those related web pages and press the “Next Page” or “Back” bottom.

Figure 4.10: Personal information filtering agent.

In this approach, we have described the policy of supporting personalized search based on ISS. We also have implemented these functions into Octopus. In order to keep advantages of ISS, a personal information-filtering agent is added into the Octopus instead of modifying architecture or interface of ISS. In Octopus, an absolutely irrelevant feedback approach used to support the personalized search. Many advantages of using the ISS to build a multi-search engine have been raised in [19]. There are other advantages that are discovered in the design. First, a personal information-filtering agent is added into the Octopus instead of modifying architecture of interface of ISS when adding personal functionality. We believe this design is more suitable to exploit useful data in the other application. Second, because the interface of ISS is based on the distributed object-oriented technique and the modules of personalized search are implemented as replaceable components.

It is ease to replace these components when a new and more suitable algorithm is proposed.

Third, each user with specific domain has individual profile in Octopus. It might avoid the

Octopus return unsuitable results to users. Finally, because the filtering mechanism is divided into three levels, it can balance the load and the user requirement.

在文檔中 A Study on Email-based Mobile Agent Runtime Environment (頁 84-89)