G OAL - 雲端儲存環境中即時行為違反驗證機制

1. INTRODUCTION

1.3 G OAL

Although it was a new solution, we want to develop a better one for it. Client devices will never need to cache any hash values of files, but still can achieve the function of Real-time POV. It will be more efficient and practical for using cloud storage.

This paper is organized as followed: In chapter 2, we will introduce the scheme of Real-time POV. In chapter 3 will present the protocol of writing and reading files under the scheme of Real-time POV. In chapter 4 presents the the detail of implementation and experiment result. Chapter 5 and 6 present related work and conclusion.

Chapter 2

A Novel Real-time POV Scheme

2.1 Hash tree

Before presenting our Real-time POV scheme, we want to show an intuitive solution. Hash tree [15] is the intuitive choice when people want to verify the integrity of files transferred between computers by network. It uses leaf nods to store the hash value of data blocks and has a root hash (or top hash or master hash) on the top. Usually, a cryptographic function such us SHA-1, whirlpool or Tiger is used for hashing. We can get the root hash by calculating hash values from leaf nodes to the top. Hashing the concatenated hash values from child nodes can produce a hash value for parent node.

Repeat the operation will get root hash. Because of the property of cryptographic hash function, it can represent the information of the whole data blocks.

The idea inspires us to use hash tree to conserve the information of the whole files stored in cloud storage. We can alter the data block in a leaf node by the hash value of a file. Then root hash can represent the status of the whole files. Furthermore, Hash tree can be stored in cloud storage and just save the root node in client device. When client wants to read a file from cloud storage, he can fetch the hash tree at the same time to audit the file. However, we found it exists some problems. First, only a single value of root hash can not represent the version of hash tree, because it changes by the time.

Cloud storage may forge them. Second, we have to assure the protocol to update hash tree follows the order of each writing and reading operation requested by multiple client devices, and cannot be changed by cloud storage among transaction. Third, the files in an account from cloud storage may be shared by multiple client devices. We need to share the root hash after each client device gets a new one.

2.2 System architecture

In this chapter, we are going to have an overview of system architecture. The system involves a cloud storage, a synchronization server and some client devices that a user employs them to access his/her account in the cloud storage. We can see the relation among them in Fig. 1.

Syn Server Client Devices Cloud Storage

Files

FBHTree Root Hash

. . .

( store ) ( store )

Fig. 1. System Architecture

Cloud storage in this system stores the files from client devices and maintain hash values of the files in FBHTree. FBHTree, Full Binary Hash Tree, is a new structure proposed in this paper to increase the efficiency for using the hash values inside. That is, this paper does not use the original structure of hash tree mentioned in chapter 2.1.

The detail of manipulating FBHTree will be discussed in the chapter 2.3.

Synchronization server here is built by client devices in local environment or a cloud by another service provider. It takes responsible for concurrent control and pass cryptographic proofs for client devices. In other words, every time a client device tries to send a read or write request to cloud storage, synchronization server passes proofs to him/her and lock the next one until the present one returns a new proof. The proof is a three-tuple value s(γ, SN, Sig) generated by cloud storage. “γ” is a root hash from FBHTree, SN is the sequence number of request from read or write, and Sig is a digital signature signed on the γ and SN by cloud storage’s private key [15]. SN will be 1 at beginning, then plus 1 at each time client devices finish a transaction. It is to assure the order of root hash. That is, it can assure the freshness of files we read from cloud storage.

Because the three-tuple value are shared by synchronization server, client devices do not need to be online [16] all the time.

For determining the location of leaf nodes to store hash value of files, client devices and cloud storage agree on an index function Γ in the beginning of an account is created. The main idea of the proposed scheme is to let cloud storage no longer need to send a complete hash tree to client devices to derive root hash, but a slice of FBHTree is enough. Though the result of forth experiment, we will show the efficiency of it.

2.3 FBHTree and Index function

Tree Height

Internal Node Leaf Node Pair List

Leaf node ID 0 1 2 3 4 5 6 7

Root Node

Fig. 2. Full Binary Hash Tree in height being 4 Pair value = hash¹ (file name) | hash (file name |hash (file))

Pair List = Pair value 1st -> Pair value 2nd ->…-> Pair value Nth

Leaf node = hash (Pair List)

Internal node, Root node = hash (left child node | Right child node) SN = Sequence Number

In this chapter, we are going to introduce the structure of FBHTree and how it works. The structure of FBHTree can be split into four different parts: Leaf nodes, Internal nodes, a Root node (γ) and a Sequence number (SN). It is a full binary hash tree, so it has totally 2^N-1 nodes and 2^N-1 leaf nodes if the tree height is N. In the Fig. 2, we could see the example of FBHTree in height being 4, it has totally 15 nodes and 8 leaf nodes. The following will talk about the details of each part.

Leaf nodes are used to store pair value of a file, and each file has a location to store. Pair value if the set hash value of file name and file hash. Because FBHTree has

1 hash is a SHA-256 hash function

a concrete number of leaf nodes, we use an index function Γ to determine the location of it. The function looks like below.

Γ (Pathname) = hash (Pathname) mod 2^N-1

In a word, Γ returns 0 to 2^N-1-1 if the tree height of FBHTree is N. Pathname is the input of this function. For example, Γ(/d1/d3/d5/f2) = 3. It means the pair value of a file f2 from directory d1, d3, d5 will be mapped into a leaf node on leaf node ID 3.

Through the function, we could imagine collision happen to different files, and there will be more than one pair value located in the same leaf node. Here uses link list to connect them into a list called pair list. After that, the hash value of a leaf node can be calculated by hashing all the pair values in the pair list.

Each internal node has a hash value calculated by hashing the concatenated left and right child nodes. Repeat the same operation from bottom of the tree to the top, we can get a unique hash value of root node “γ”, or called root hash. Because of the characteristic of cryptographic hash function, any modification of a value in a tree will lead to a totally different value on root hash. A root hash has the ability to represent the integrity of the whole files in cloud storage.

At the beginning client devices register an account in cloud storage, they have to determine the numbers of files going to store. Because it concerns to the structure of FBHTree going to build and the size of leaf nodes. In the first experiment of chapter 4, we will study the height of FBHTree proportional to the amount of files. After initialing FBHTree, client devices will get the first γ and SN with digital signature on them.

In the following chapter will talk about the procedure of manipulating FBHTree by a structure called slice for cloud storage. And also showing how client devices use slice for auditing to achieve Real-time POV scheme.

2.4 Slice of FBHTree

Tree Height = 9

97 Leaf node ID

Slice

Fig. 3. Slice of FBHTree on leaf node ID 97

Slice of FBHTree (or slice) is the most important unit of structure in FBHTree. It has three functions to do. First, cloud storage uses it to update FBHTree with the file form client device. Second, client device uses it to assure that cloud storage updates FBHTree correctly for him/her. Last, client device uses it to audit a file to check the freshness of it.

Each leaf node ID = i can refer to a slice and denotes to slice(i). From the leaf node with i, slice is a list of nodes by tracing the route from bottom of the tree to the top.

Each parent and sibling node it passes consists the elements of the list. It is a slim area of FBHTree. In the third experiment of chapter 4, we will study the relation of memory usage between slice and FBHTree. Fig. 3 is an example of slice (97) from an FBHTree in height being 9. It has two nodes in each level and one node in the root. Here we can imagine each slice from different lead node ID has the same size in an FBHTree.

We can derive the root hash of an FBHTree if we have one of the slice from it. It is because there are all the hash values of internal nodes from a leaf node to the root node. If a client device has a correct root hash “γ”, he/she can use it to verify the slice.

Furthermore, a verified slice means that correct pair values are stored in it. The following chapter will talk about how slice works by the three functions.

2.5 Update Slice

New Root Hash and SN

New node File ƒ

Fig. 4. The nodes in Slice need to update

When a cloud storage receives an updated file from client device, he follows a procedure to update slice to maintain the latest status of the file f. We could get a new Root Hash “γ” and a new Sequence Number “SN” during the operation. Here has 4 steps.

Step 1 : In the beginning, cloud storage needs to have three elements from client device.

It includes a Path Name, a new Hash Value of a file, and a Sequence Number.

Sequence Number is to notify cloud storage the order of FBHTree in this transaction.

Step 2 : Cloud storage uses the Path Name and index function to find the corresponding leaf node ID. Then use Path Name again to find the corresponding pair value in pair list. Update it with the new Hash Value from client device. Here we can get an updated pair value.

Step 3 : With the updated pair value, we could update leaf node, internal nodes and root node from bottom of slice to the top. Then we can get a new Root Hash.

We could see the example in Fig. 4. The nodes with mark on it are the ones need to update through the route in slice. This is the 1^st output produced in the operation.

Step 4 : Cloud storage increases Sequence Number by one to get a New Sequence

2.6 Transfer and derive Slice for root hash

Every time a client device tries to WRITE and READ a file, he has to audit slice of FBHTree during the transaction. And auditing represents different meaning in the two mode. In the WRITE mode, client uses slice to assure that cloud storage updated new hash value of file correctly on FBHTree; In the READ mode, client uses the slice to assure that the file from cloud storage is in the latest version and correct one. At the beginning of the two auditing, client device has to get a slice from cloud storage who extracts and encapsulates it into a list like below in Fig. 5.

Index of PV Length of PV PV_1

PV = pair value

PV_2 … PV_n L9 R9 L8 R8 … L2 R2 Root

L9 = left node in level 9 R9 = right node in level 9

Information of Pair List Information of nodes in slice

Fig. 5. Encapsulated Slice from FBHTree in height being 9

After client device receives the Slice, he derives it into the structure according to the leaf node ID from index function. The structure looks like Fig. 6.

2.7 Audit in WRITE Mode

After cloud storage updated slice (mentioned in chapter 2.5), client device has to audit that cloud storage did it correctly. Before starts to audit, client device has to collect some information. It includes a Root Hash “γ” and a Sequence Number “SN” from synchronization server. And also Old Slice, New Root Hash and New Sequence Number from cloud storage. Here goes the procedure of auditing in 5 steps.

Step 1 : Derive the Old Slice to get Root Hash.

Step 2 : Compare the Root Nash from Step1 with the Root Hash from synchronization server. If they are the same, we can assure that the Old Slice comes from the FBHTree before the new hash value of a file is updated to it.

Step 3 : Find the pair value in leaf node by the tag along with Old Slice. The tag shows the position of pair value that the file refers to. Then update the pair value with Path Name and New Hash Value. Here we can get an updated pair value.

Step 4 : Derive the Old Slice again with the updated pair value in step 3. We can get an Updated Root Hash.

Step 5 : Compare the Updated Root Hash in step 4 with the New Root Hash from cloud storage. If they are the same, we can declare that cloud storage updated hash value of file correctly on FBHTree.

2.8 Audit in READ Mode

In the READ mode, client device audits the file at the time he receives from cloud storage. It it to assure that the file is in the latest version and correct. Before starts to audit, client device has to claim some information. It includes a Root Hash and a Sequence Number from synchronization server. And also a File and a Slice from cloud storage. Here goes the procedure of auditing in 4 steps.

Step 1 : Derive the Slice from Leaf Node to Root Hash.

Step 2 : Compare the Root Hash and Sequence Number with the Root Hash from Synchronization Server. If Root Hashes are the same and Sequence Number increases by one, we can assure that the Slice comes from the correct FBHTree.

Step 3 : Get a hash value by hashing the Path Name and hash value of the File.

Step 4 : Find the pair value in leaf node by the tag along with the correct Slice. The tag shows the position of pair value that the file refers to. Compare the hash value in pair value with the one from step 1. If they are the same, we can declare that the File is in the latest version and correct.

2.9 Efficient operation of FBHTree

According to the operation of dealing with hash values of files and root hash in FBHTree, the time on manipulating slice will affect the efficiency on this system. In this chapter, we will discuss how to implement extracting slice from an FBHTree and deriving root hash from an FBHTree in detail. There is a way to build a binary tree with pointers pointing to each other. But it has two problems: One is the overhead on storing pointers; the other is the time on traversing through leaf node to root node.

Here propose to use one-dimension array to store an FBHTree. For an FBHTree with height being N, the one-dimension array is 2^N-1 elements inside. Referring to Fig.

7, it shows an FBHTree with height being four and the one-dimension array with 2⁴-1

= 15 elements. Note that each node with a tree node ID is given sequentially from the first level to the last and from the left child to the right. In addition, each leaf node ID, I, can be translated into a tree node ID, X, simply by X = I + 2^N-1. The tree node ID is actually the array subscript of the one-dimension array and corresponds to the return value of index function Γ.

Fig. 7. Picture of one-dimension array

Algorithm 1 shows the procedure how to extract a slice from an FBHTree stored in a one-dimension array. Because Ψ is a one-dimension array, fetching a hash value does not need to traverse through pointers in a binary tree but doing some direct access to an array to pick up some elements from it.

Algorithm 1: Extract a slice from an FBHTree which is stored in a one-dimensional array. Assume that the height of the FBHTree is N.

Ψ: a one-dimensional array which stores an FBHTree

Here use the FBHTree and array shown in Fig 7 to illustrate Algorithm 1. We assume the situation to extract slice(3) from the FBHTree, N = 4, I = 3, and Ψ = { h1, h2, h3, h4, h5, h6, h7, h8, h9, h10, h11, h12, h13, h14, h15}. In the step (2), we have slice (3) = {h10, h11, h4, h5, h2, h3} at the time when values of X are 11, 5 and 2. Finally, it becomes to slice (3) = {h10, h11, h4, h5, h2, h3, h1} after step (4).

Algorithm 2 is to derive the root hash from a slice. In the WHILE loop of step (5), it iterates the hashing and concatenating operation from bottom of the slice to the top.

The complexity of it is O(N), where N is the height of FBHTree.

Algorithm 2: Derive the root hash from a slice. Assume that the height of the

IF (X is an even number) THEN   γ = hash (γ | S[pt +1])

ELSE  

γ = hash(S[pt] | γ) END IF  

pt = pt + 2   X=⌊X÷2⌋

END WHILE  

(6) IF (X is an even number) THEN γ = hash (γ | S[pt +1])   ELSE 

γ = hash (S[pt] | γ)  END IF  

Here continues with our previous example where N = 4, I = 3 and slice (3) = {h10, h11, h4, h5, h2, h3, h1}. According to algorithm 2, we have γ = {h11} in step (4), because 11 is old number. In the step (5), it iterates the hash function and determine the parent node that child nodes belong to, so we have γ = hash (h10|h11) and then γ = hash (h4|hash(h10|h11), respectively. Finally, we have the root hash γ = hash (hash (h4|hash (h10|h11))|h3).

CHAPTER3

Protocol

In this system, it has two operations for client device to access the files in cloud storage. Those are WRITE a file and READ a file. The following will talk about how client device operates the transaction among cloud storage and synchronization server in the two protocols.

3.1 WRITE a file

Syn Server Client Cloud Storage

1. Request proofs

2. Request to write (Path Name, File Hash)

6. Audit old slice 5. Send back old slice,

New root hash

3. Extract old slice 4. Update slice

7. Audit new root hash 8. Update proofs

9. Write File Get proofs

Fig. 8. Protocol to write a file

Here will talk about the protocol how to WRITE a file. It has totally 9 steps to finish the transaction and leave the latest attestation to the next client device. It is also a parallel writing protocol. Client device does not need to wait another one to finish the whole transaction, but can start in the intermediate. The result brings the system a higher throughput.

Step 1 : Client claims the Root Node and Global Sequence Number from synchronization server. Synchronization will lock the next client at this time, which is to protect the order of writing data and writing on the slice of FBHTree.

Step 2 : Client sends the request of WRITE to cloud storage. The information

在文檔中雲端儲存環境中即時行為違反驗證機制 (頁 9-0)