Thesis Outline - 應用部份保密技術於密文資料探勘之協定設計

The rest of this thesis is organized as follows. Chapter 2 describes the back-ground on the data mining algorithms and the cryptosystem forming the basis of our proposed protocols. Chapter 3 introduces the proposed proto-col for securely perform distributed association rule mining on private. We present the other protocol for privacy-preserving decision tree learning on Chapter 4. Chapter 5 gives concluding remarks and outlines direction for future work.

Chapter 2 Background

2.1 Homomorphic Encryption

Homomorphic encryption is the scheme that allows computations to be car-ried out on ciphertext. The decryption of computation results match the outcome of operations performed on the plaintext. The concept of homo-morphic encryption, or privacy homomorphism was first proposed to the sci-entific community in 1978 by Ronald Rivest, Leonard Adleman and Michael Dertouzos. A semantically secure homomorphic encryption scheme was de-veloped and proposed by Shafi Goldwasser and Silvio Micali in 1982. In 2009, Craig Gentry proved that a completely homomorphic encryption scheme is possible.

Rivest, Aldeman and Dertouzos developed their theory based on the fact that the existing security and encryption systems severely limit the ability to manipulate data after it is encrypted and turned into ciphertext. Without the development of a homomorphic solution, “sending” and “receiving” data are the only function that can be accomplished with encrypted data. The biggest concern was the level of computing that processes the encrypted request on

the encrypted data. This manipulation may reduce the security level of the encryption scheme.

With the advent and rapid expansion of cloud computing, a feasible ho-momorphic encryption method is crucial. Otherwise, the risk is too high to entrust sensitive data to a cloud computing service provider. If a service provider can access data in their decrypted form, the data can directly ex-pose to malicious users. [6] proved that homomorphic encryption is viable, though the amount of computation time is a concern.

In [6], the author outlined how to create an encryption scheme that can allow data to be securely stored in a cloud environment where the owner can utilize the computational power of the cloud provider to manipulate the encrypted data. There are three main steps in [6]. An encryption scheme is constructed that is “bootstrappable”. In this step, a somewhat homomorphic encryption scheme can work with its own decryption circuit. Next, an almost-bootstrappable public key encryption scheme is built using the idea of ideal lattices. Finally, the schemata are simplified, while maintaining the property of being bootstrappable.

Although [6] created a completely homomorphic encryption scheme, it remains impractical. Homomorphic encryption has evolved to be mostly se-cured against chosen plain-text attacks, but securing against chosen cipher-text attacks remains a problem. In addition to the security issue, the fully homomorphic schemes are so complex that the time factor has precluded their usage in many applications. Somewhat homomorphic encryption sys-tems have been developed to address at least the time factor, using only the most efficient portions of a completely homomorphic encryption scheme.

In this thesis, we apply homomorphic encryption in realistic world. In other words, efficiency should be taken into considerations. We use partial

homomorphic encryption as our mainly used encryption scheme and combine protocols design to realize the privacy-preserving data mining process. There are several efficient partial homomorphic cryptosystems:

2.1.1 Unpadded RSA

If the RSA public key is modulus m and exponent e, then the encryption of a message x is given by E(x) = x^e mod m. The homomorphic property is then

E(x₁)· E(x2) = x^e₁x^e₂ mod m = (x₁x₂)^e mod m = E(x₁· x2).

2.1.2 ElGamal

In a group G, if the public key is (G, q, g, h), where h = g^x, and x is the secret key, then the encryption of a message m is E(m) = (g^r, m· h^r), for some random r ∈ {0, 1, · · · , q − 1}, the homomorphic property is then

E(x₁)· E(x2) = (g^r¹, x₁· h^r¹)(g^r², x₂· h^r²) = (g^r¹^+r², (x₁· x2)h^r¹^+r²).

2.1.3 Goldwasser-Micali

In Goldwasser-Micali cryptosystem, if the public key is the modulus m and quadratic non-residue x, then the encryption of a bit b is E(b) = x^br²mod m, for some random r ∈ {0, 1, · · · , m − 1}. The homomorphic property is then

E(b₁)· E(b2) = x^b¹r²₁x^b²r²₂ = x^b¹^+b²(r₁r₂)² = E(b₁⊕ b2).

2.1.4 Benaloh

If the public key is the modulus m and the base g with a blocksize of c, then the encryption of a message x is E(x) = g^xr^cmod m. for some random

r ∈ {0, 1, · · · , m − 1}. The homomorphic property is then

E(x₁)· E(x2) = (g^x¹r^c₁)(g^x²r^c₂) = g^x¹^+x²(r₁r₂)^c = E(x₁+ x₂ mod c).

2.1.5 Paillier Cryptosystem

The Paillier Cryptosystem [7] is a public key encryption scheme based on modular arithmetic, created by Pascal Paillier. The homomorphic property in Paillier cryptosystem is additive homomorphism as follow:

E_k(x)× Ek(y) = E_k(x + y).

Encryption

To encrypt a message using the Paillier cryptosystem, a public key must be established first.

To construct the public key, one must choose two large primes, p and q, then calculate their product, n = p·q. Then a semi-random, nonzero integer, g, in Zn², must be selected so that the order of g is a multiple of n in Z^∗_n². Thus, the public key is (n, g).

The steps of encryption is as follows:

1. Create a message, m, with m∈ Zn.

2. Choose a random, nonzero integer, r∈ Z^∗n. 3. Compute c≡ g^m· rⁿ mod n².

Decryption

1. Define L(u) = (u− 1)/n.

2. Calculate L(g^λ(n) mod n²) = k.

3. Compute µ≡ k⁻¹ mod n². 4. m≡ L(c^λ(n)mod n²)· µ mod n.

Our proposed protocols use additive homomorphic scheme to securely sum up the encrypted results, so we take Paillier cryptosystem as our en-cryption scheme. Also, Table 2.1 shows the key size recommended by NIST for security consideration, we implement our system with 1024-bit key size.

Table 2.1: NIST Recommended Key Size Symmetric Key Size

Association rule mining is a process that help find the confidential rules from a large amount of data. The problem can be defined as follows:

Let I = {i1, i₂...i_n} be a set of items. Let T = {t1, t₂...t_n} be a set of transactions, where each t_i ⊆ I. Given an itemset X ⊆ I, a transaction ti

contains X if and only if X ⊆ ti. An association rule is an implication of the form X ⇒ Y where X ⊆ I, Y ⊆ I, and X ∩Y = ∅. The rule has support s in the transaction database DB if s% of transactions in D that contain X∪ Y . The association rule holds in the transaction database D with confidence c if

在文檔中應用部份保密技術於密文資料探勘之協定設計 (頁 11-16)