Chapter 2 Preliminaries and Related Work
2.2 Background of KAD
Each KAD node has a global identifier, referred as KAD ID, which is 128-bit long and randomly generated by a cryptographic hash function. The designers of KAD decided to consider a contact sufficiently close to the target if it shares with it at least the first 8 bits. The space of KAD IDs that satisfy this constraint is called tolerance zone [17]. There are
256
28= zones in a KAD P2P network. We will briefly explain the lookup, publishing, and (a) Each object has some keywords
Keyword Object set
Keyword 1 {Object 1, Object 2, Object 3 }
Keyword 2 { Object 1, Object 3, Object 4}
Keyword 3 { Object 1, Object 2}
Keyword 4 { Object 3, Object 4}
Keyword 5 { Object 2}
(b) An inverted index
Figure 1. An inverted index example of four objects [15].
2.2.1 Lookup procedure
When searching for some objects, a peer needs to know the target location and explores the network in several steps. Each step will find peers that are closer to the target. Routing in KAD is based on prefix matching. In KAD networks, the distance between two nodes is calculated by XOR-distance. The XOR-distance is defined as d(a, b) = a ⊕ b. It calculated bitwise on the KAD IDs of two nodes, e.g., the distance between a = 10011 and b = 01111 is d(a, b) = 10011 ⊕ 01111 = 10100. Routing to a KAD ID is done in an iterative way. Figure 2 is an example lookup procedure. In the first step, the searching peer has three closest possible contacts from the routing table. They have different XOR-distances and are still not close enough to the target peer. The second step in Figure 2 shows that the searching peer received three responses. The searching peer obtains three more closer possible contacts by the responses. If a new possible peer in the tolerance zone, it will be stored to a list called the candidate list. In the third step, two of these possible peers are in the tolerance zone. These two peers will be saved to the candidate list. In the fourth step, the searching peer sends a request for more closer peers to the three closest peers again. The lookup procedure terminates when the lookup responses contain only peers that are either already present in the candidate list or farther away from the target than the other top three candidate peers [17]. At this point, no new request is sent and the candidate list becomes stable. KAD travels only O(logN) peers during the execution of the lookup procedure when there are N peers in the network.
2.2.2 Publish procedure
Publish is an essential action when peers want to share objects. Peers will publish keyword keys and a source key to foreign peers. In Figure 3, the KAD ID of the peer is
“10111.” An object can produce two different keys, a source key and keyword keys. A source key is computed by hashing the name of the object. Keyword keys are computed by hashing keywords from the name of the object [16]. The keywords of this object are “Modular” and
“KAD.” In Figure 3, the source key is “01011” and the keyword keys of “Modular” and
“KAD” are “00001” and “00100,” respectively
Figure 2. An example iterative lookup procedure [16].
Figure 4 shows an example of publishing steps for an index. Before publishing an index, a sending peer must use KAD_REQ to find a receiving peer. At first the sending peer sends a KAD_REQ to the receiving peer. KAD_REQ is used to find the receiving peer and check whether the peer is alive. When the receiving peer receive KAD_REQ, it will send a KAD_RES back. After establishing a connection between the sending peer and the receiving peer, the sending peer starts to publish keys to the receiving peer.
Figure 4. The KAD publish steps for an index [16].
Object Source Keyword
Keyword
01011
00001
00100 Peer 10111
Figure 3.An example of an object to be published.
When a peer starts to publish keys, the peer will publish a source key and keyword keys by 2-level publishing scheme. Figure 5 shows an example 2-level publish. A peer “10111”
wants to publish an object named “Modular KAD.” This object name will result in two keywords, “Modular” and “KAD.” All relevant references to the original object are generated, such as the source key and the keyword keys. Next, keyword keys “Modular 00001” and
“KAD 00100” are published to corresponding peers “00001” and “00100” to build indexes, which are all pointed to peer “01011.” Finally, the source key is published, with an index pointing to the publishing peer.
In KAD, each key is not published just on a single peer that is numerically closest to that key, but on 11 different peers whose KAD ID matches at least the first 8-bits of the key. This zone around a key is called the tolerance zone or the keyspace [17].
2.2.3 Search procedure
Like publishing, searching files is also a 2-level search: keyword search and source search. For a keyword search, the hash value of the first word of the user input is computed.
The rest of words are packed in a form of a search tree. A query consists of a hash value of the first keyword and a search tree [16]. The query is routed to the peers that have a KAD ID close to the hash value. The matching results are responded from that peers and carry the information of source keys. For a source search, a user chooses a desired object from returned results. Then the source key of the object is used for searching the peers who have the object.
The returned results would be added to the download queue of the object.