Background - 一個整合OpenID與點對點技術之部落格系統

2.1.1 Peer-to-Peer (P2P) network

The data and computing were centralized in a small number of servers in traditional, and all the clients must be directly dependent on them. This approach is called server-based network, or client/server module. On the contrary, we distributed the data and computing loading over the computers (peers) on the network, this is called Peer-to-Peer (P2P) network [2]. There are many successful P2P applications, such as BitTorrent, Skype. They show the advantages of P2P network and verify the practicability of P2P network architecture.

The main characteristics of the P2P systems are the ability to pool together and harness large amounts of resources, self-organization, load balancing, adaptation and fault-tolerance [6].

“There are two classes of P2P overlay networks: Structured and Unstructured”

[7]. “An unstructured P2P network can be easily constructed as a new peer that wants to join the network can copy existing links of another node and then form its own links over time. The main disadvantage with unstructured P2P networks is that the queries may not always be resolved. If a peer is looking for rare data shared by only a few other peers, then it is highly unlikely that search will be successful”[2].

Structured P2P network usually has a distributed hash table (DHT) to resolve the problem. Such a network use hash function to give endpoint a hash value, and

determine the endpoint to be responsible for which content according to specific protocol. Many studies are published about DHT in the P2P network, such as Pastry[3], Chord[4], CAN[5].

A P2P network builds on the overlay network above the Internet. “An overlay network is a computer network which is built on top of another network.” “Overlay networks can be constructed in order to permit routing messages to destinations not specified by an IP address.”[8] And other advantages. So using virtual communication protocol, we can escape from original Internet communication protocol. For example, A structured P2P network maintains a DHT and routing messages to logical address without having IP address. Therefore, the P2P network can work smoothly by its specific protocol.

The disadvantage of P2P networks is that users need query everything to get data.

And they find the data from lots of results. Paul [9] has an example: In file-sharing, if Bob hopes to get the latest publication, he must query with the author name, and look for his interest publication to download it. To resolve the search problem in P2P network, researchers tried to integrate Pub/Sub and P2P systems and provided a greater mechanism.

2.1.2 Peer-to-Peer Publish/Subscribe

Publish/Subscribe (Pub/Sub) is a popular message-oriented middleware (MOM). The main characters are publisher and subscriber. The publisher is the information provider. The subscriber has the capability to define its interest event or event type and subscribe them.

Whenever the publisher publishes an event, Pub/Sub system will transfer these events to their subscribers [6]. Therefore, Pub/Sub systems have the following three functions [9]:

1. Advertising: publishers publish the event to Pub/Sub systems.

2. Subscribing: subscribers subscribe events or event types.

3. Notifying: after publishers publish new events, the Pub/Sub systems will send them to their subscribers.

Pub/Sub systems have two kinds, topic-based and content-based. In topic-based system, subscribers can subscribe a topic and get any events about the topic. In content-based system, it describes events by using specific attributes and values and has more difficult technology than topic-based system.

An advantage of Pub/Sub systems is loosely-coupled. Publishers and subscribers don’t know each other but the Pub/Sub system works smoothly. The other advantage is that Pub/Sub systems are more scalable then traditional client/server module.

After P2P and Pub/Sub are combined, the Pub/Sub system provides services on the upper layer, and the P2P system is responsible for messages routing on the bottom layer. The Pub/Sub system allows the P2P system to own great ability in information filter and cuts down the consumption in bandwidth. In recent years, there are many studies about combining Pub/Sub with P2P. Scribe and Hermes are the representation of the early P2P Pub/Sub systems. They are both based on Pastry. After them, most of the studies are based on Pastry and Chord.

There are lots of P2P implementation, but less have great maintenance and enough scale.

The table below includes three representative implementations in common use [10] [11]

[12].

Name Latest Version P2P Language Pub/Sub System JXTA JXSE 2.5 (2007/11/7) unstructured Java propagation

Open Chord 1.0.5 (2008/4/11)

structured

Table 2-1 Three P2P Pub/Sub implementations

JXTA has a message-passing mechanism called JXTA wire. Its capability is similar to Pub/Sub system, but its operation is related to non-pure P2P network (it needs super peer).

And it is one-to-many message-passing mechanism, is different to Pub/Sub systems that are many-to-many message-passing mechanism. In addition, its performance is poor because of propagation method. Open Chord is based on Chord. Although Chord has related documents on Pub/Sub systems, it seems to have no result in Open Chord. By contrast, the Scribe in FreePastry is a simple topic-based Pub/Sub system and meets our need. So we adopt FreePastry to implement the P2P blog system.

2.1.3 OpenID Authentication

“OpenID is an open, decentralized, free framework for user-centric digital identity.”[13] It provides a free and easy way that users can use a single digital identify across the Internet. It also eliminates the need for multiple usernames across different websites, and simplifies users’ online experience.

The following are terms in OpenID protocol [14]:

1. Identifier: The URL or XRI chosen by the End User as their OpenID identifier.

2. End User: The person who wants to assert his or her identity to a site.

3. Relying Party: The site that wants to verify the end user's identifier.

Sometimes it is simply called site.

4. OpenID Provider: A service provider offering the service of registering OpenID URLs or XRIs and providing OpenID authentication (and possibly other identity services).

The current version is OpenID 2.0, released in December 5, 2007. The end user has an OpenID identifier and logins a website supporting OpenID by the identifier.

The website usually has an input box with an OpenID logo for user’s identifier. The user’s identifier is also a URL that describes the authentication server. The

unauthenticated identifier is called “claimed identifier”. The relying party can find user’s OpenID provider from the URL and require authentication. This moment, the user’s browser directs to the website of OpenID provider. The user logins and permits the authentication from the external site. Then the authentication server notifies the external site of the successful authentication. The user’s browser redirects to the site and uses services with OpenID identity.

As of July 2007, there are only approximately 120 million valid OpenID accounts and approximately 4,500 sites have integrated OpenID consumer

support.[15] But some sites with a large number of members also began to support the plan.[15][16] For example, Yahoo users can use their Yahoo ids as OpenIDs starting January 31^st, 2008. We can expect the number of OpenIDs becomes three times. The situation of OpenID authentication is more and more universal. So our system adopts the solution to eliminate the problems of identity authentication and management.

2.1.4 Blog System

Blog is an abridgment of the term web log. A blog is a website that a person or people write articles and are commonly displayed in reverse chronological order[23].

The blogger writes their words without any limitations. The articles are usually personal experience or life log. The ratio of original articles to all articles is high. The display includes words, pictures, video or hyperlink. Besides, the articles cause exchanges between the netizens. They communicate with others by trackback, comment.

There are two kinds of methods to build a blog. The first is building the blog system by self. It includes writing blog programs or blog suit software, and renting

virtual space in the Internet or using personal machine to store. The second is to rely on blog service providers (BSPs).

The advantage of the first way is the user has the control of all data. But the user must have enough professional knowledge, and he has to expend large effort and time on the maintenance of software and hardware. Therefore, it usually is adopted by the people having computer technology. By contrast, the second way popularizes the blog to people. Anyone can be a blogger without handling the problems about system security, data backup and machine updates. So it is the main way to build a blog.

Although building a blog through BSP is convenient, there are potential problems.

Every blogger writes articles or data on the blog, but the controller is BSP not the author. Some authors had the experience of damaging writings cause of the BSP’s policies changing and began to resist BSP [17][18][19].

More and more bloggers do not want the information held in hands of BSP.

Some bloggers begin to build blogs by themselves, but most people still can not escape from BSP and bear the loss silently because of no the enough professional skill.

Furthermore, the number of blog in the Internet is multiples of growth [20]. Each user wants to visit more and more blogs and is unable to stand the load. So the

RSS/Atom technology is used to track user’s favorite blogs, it eliminates the matters that user has to open the web pages and find new information.

在文檔中一個整合OpenID與點對點技術之部落格系統 (頁 12-17)