1.1 Current development
Traditional telephony services have been transformed by the developments of mobile technologies and Internet technologies. More and more Internet communication services, such as MSN and skype, have been widely used through wired and wireless networks. The advantages of using the Internet as the communication platform are cheaper charging rate and more powerful functionalities of the services. However, the PLMN (Public Land Mobile Network) system still plays an important role in our daily life, so most of those communication system developers have made efforts to integrate their systems into mobile devices for user convenience, or even integrate their systems with the PSTN (Public Switching Telephone Network).
In recent years, peer-to-peer technology has been one of the most popular technologies employed in Internet applications. More and more applications using a peer-to-peer overlay network for information multicasting, object searching, and load balancing. Those functionalities can be provided efficiently in P2P architecture. P2P file sharing systems and real-time streaming video services are the most popular applications on the Internet and consume most of the Internet bandwidth in recent years. However, P2P systems still have some shortcomings with it. For example, security is an important issue. Because centralized server does not exist in P2P systems, how to trust other nodes is an unsolved problem. Another security problem is user/device authentication. Because of un-trusted nodes, user/device authentication without a trusted server is almost impossible to achieve; this bottleneck limits P2P in commercial applications. More and more researches toward robust and reliable P2P architecture, but the most popular
General speaking, a peer-to-peer system is used for storing data, and supporting efficient publishing and searching mechanisms. Most of the current peer-to-peer researches focus on improving the efficiency of routing and searching under various conditions. Different assumptions of the network environments, like churn handling and location awareness, result in different approaches to optimizing the system. Searching is one of the most important functionalities provided by peer-to-peer systems. Contrast to centralized servers, peer-to-peer systems store data in each peer node. The overhead of communications on network replaces the overhead of database access on the central server, and becomes a critical bottleneck under complex network conditions. Furthermore, due to the existence of unstable nodes, data synchronization problem makes data in peer-to-peer unreliable. Although storing multiple replicas of data in more nodes may decrease this disadvantage by adding endurable overhead, searching in the peer-to-peer network is usually considered as a best-effort function. Another extend problem is load balancing. Because hot keywords dominate major part of the searching load, the responding nodes of those keywords in the peer-to-peer network may cause heavy performance overhead. Most researchers have attempted to break the continuation of hot keywords, but single hot keywords couldn’t have been completely resolved until now.
As we know, the information retrieval technology has wide influence on network, and semantic keyword searching is the foundational part for information retrieval.
Although semantic searching is hardly implemented in peer-to-peer network because of the property of distribution, there are some researches about how to achieve more complex searches in peer-to-peer network, such as numerical query and and/or query.
With the popularity of peer-to-peer technology, researches of searching algorithm will be a new trend in network.
1.2 Motivation
Communication services today, such as telephony, instant message, email, and VoIP, use a specific user or device ID, such as telephone number, e-mail address, and SIP URI, to specify the called party. In the beginning, the naming of user/device ID was restricted by device capability in order to simply implementation. However, with more powerful user devices, there are less constrains on the naming; IDs that are more meaningful and representable can be used. The trend of user/device ID is toward meaningful, representable, distinct and rememberable.
Although a specific ID can uniquely specify a user, it would be very useful if we can initiate a communication with a callee(s) without knowing the callee’s specific ID. For example, Billy has graduated from NCTU 20 years ago, and some day he wants to hold a reunion, but the contact methods of some classmates have been invalid. Since using specific ID has drawbacks of record invalidity and the troubles to update the record, a more intuitive communication method can be useful.
One way to indicate the callee(s) of a communication is to specify the callee’s attributes, such as the name, the age, and the school he or she studies, etc. A user is associated with a set of attributes. In another point of view, a set of user attributes is a kind of powerful user ID that is rememberable, meaningful, and representable. Although attributes often lack for distinguishability and convenience, communication with this kind of ID provides an alternative way to supply more semantic callee.
To set up a communication with specified callee’s attributes, we need to do multi-attribute data matching between the call request and the users’ profiles. An intuitional idea is caller specify some attributes as ID, then any callee(s) have those
attributes will receive the call request. The client-server architecture with database is a direct and simple way to implement the functionality. However, peer-to-peer technology has been widely extended in multiple attributes query by researchers. It provides an alternative way to client-server architecture with better scalability and without single point of failure. Therefore it is a better choice for communication systems which support stronger searching function to adopt peer-to-peer as the backbone platform.
Even though using peer-to-peer architecture in communications is a suitable solution for next generation communication systems, there are some bottlenecks impeding the trend. First problem is performance, a link in overlay network may be an unexpected long path in the physical network, and routing in overlay network may cost more time than the systems using direct connection. Furthermore, in recent proposed methods of multiple attribute matching, the routing overhead linearly increases with the number of attributes. Lacks of a efficient publish/query mechanism for peer-to-peer leads to the bottleneck of revolutionary communication systems developed in peer-to-peer network.
Another ignored point is how to match the caller and callee’s intension for talking to each other. Most systems emphasized on the demand of caller, which caused several critical problems like junk mail or ad; and further effected users whether they would choose this system or not. Advanced communication systems should contain filtering mechanism for callee, so as to reduce unnecessary messages and transmission costs.
1.3 Objective
Our main goal is to build a communication system with the following features:
1. Support communications using specific ID and unspecific ID attributes.
2. Flexible attributes for callers and callees including numerical attributes.
3. Match the desires of both the caller and the callee, and filter unwanted call requests.
4. Efficient routing in peer-to-peer and endurable overhead to maintain other users’
queries and publishes.
5. Protect all users’ privacy and prevent maliciously gathering the user information.
6. Flexible off-line handling mechanism in every states of call flow.
In order to implement such a communication system, we adopted structured peer-to-peer architecture or DHT (distributed hash table) as platform and proposed a novel publish/query mechanism to accomplish above requirements, and we will describe that in details in later chapters.
1.4 Summary
The remaining of this paper is organized as follows. Chapter 2 shows current work in peer-to-peer researches related to our system; Chapter 3 describes our system design in details. Chapter 4 presents the actual implementation, and Chapter 5 discusses performance analysis. Finally, conclusions and future work are given in Chapter 6.