基於同態加密之多訊息私密資訊擷取機制之研究 - 政大學術集成

全文

(1)國立政治大學資訊科學系 Department of Computer Science National Chengchi University. ‧. ‧ 國. 學. 政治大碩士論文立 Master’s Thesis. y. Nat. n. sit. er. io. 基於同態加密之多訊息私密資訊擷取機制之研究 A Study on Multi-Message Private Information Retrieval al iv n Ch using Homomorphic Encryption engchi U. 研究生：徐珦朕撰指導教授：左瑞麟博士. 中華民國 109 年 7 月. DOI:10.6814/NCCU202001380.

(2) 國立政治大學資訊科學系 Department of Computer Science National Chengchi University. ‧. ‧ 國. 學. 政治大碩士論文立 Master’s Thesis. y. Nat. n. sit. er. io. 基於同態加密之多訊息私密資訊擷取機制之研究 A Study on Multi-Message Private Information Retrieval al iv n Ch using Homomorphic Encryption engchi U. 研究生：徐珦朕撰指導教授：左瑞麟博士. 中華民國 109 年 7 月. DOI:10.6814/NCCU202001380.

(3) 基於同態加密之多訊息私密資訊擷取機制之研究 A Study on Multi-Message Private Information Retrieval using Homomorphic Encryption. 研究生：徐珦朕. Student： Hsiang-Chen Hsu. 指導教授：左瑞麟博士 Advisor： Dr. Raylin Tso. 政治大立國立政治大學 ‧. ‧ 國. 碩士論文. 學. 資訊科學系 A Thesis. Nat. sit. y. Submitted to Department of Computer Science National n. er. io. Chengchi University al v ni C h of the U In partial fulfillment Requirements engchi For the degree of Master in Computer Science 中華民國 109 年 7 月. DOI:10.6814/NCCU202001380.

(4) 誌謝. 首先誠摯的感激指導老師左瑞麟教授，兩年來的照顧、指導與鼓勵，以及王紹睿教授和曾一凡教授在我的學業或論文上給我適當的指導，給予了我相當大的幫助，使我能. 政治大. 完成碩士的學業與研究。另外也要感謝口試委員王智弘教授、楊明豪教授、羅嘉寧教授. 立. 以及曾一凡教授抽空審閱與指導本篇論文，並提出寶貴的意見，使我能更完善此篇論. ‧ 國. 學. 文。. 同時也要感謝實驗室的聿劭、泓遜與守晴同學，以及仁傑、子源學長，與各位在實. ‧. 驗室的討論與合作，得以完成碩士期間的研究。. Nat. sit. y. 最後我要感謝我的家人，在我念研究時所給予我的支持與鼓勵，並適時的給予我幫. n. al. er. io. 助，讓我可以順利完成碩士的學業。. Ch. engchi U. v ni. 徐珦朕謹誌 2020/08. i DOI:10.6814/NCCU202001380.

(5) 摘要. 私密資訊擷取 (Private Information Retrieval，PIR) 為使用者在對資料庫取用資料時對使用者的隱私保護。透過私密資訊擷取，可以讓資料庫管理者沒有辦法得知使用者所. 政治大. 取出的資料為哪一個。自從 Chor 等人以及 Kushilevitz 與 Ostrovsky 過去的研究，私密. 立. 資訊擷取已經在過去二十年中有著廣泛的研究 (尤其是單一資料庫的私密資訊擷取)，但. ‧ 國. 學. 是大多數的架構僅允許使用者一次只能存取一筆資料，這會導致較高的通訊成本。而為了解決這個問題，本篇論文設計的改良版同態加密之多訊息私密資訊擷取架構能使使用. ‧. 者一次的詢問便可取回多個 n 位元資料，以提升私密資訊擷取的效率。除此之外，我們. Nat. n. al. Ch. engchi. er. io. 關鍵字：私密資訊擷取、通訊複雜度、同態加密. sit. y. 完成了架構的分析，提出正確性及安全性的證明，並且分析了通訊複雜度。. i Un. v. ii DOI:10.6814/NCCU202001380.

(6) Abstract. Private information retrieval (PIR) is a privacy protection that allows users to retrieve. 政治大. information from a database without revealing any information about the retrieved data to the server. Since the pioneering work of Chor et al. and of Kushilevitz and Ostrovsky,. 立. PIR has been extensively studied (especially the single database setting) in the past two. ‧ 國. 學. decades. However, most protocols only allow users to retrieve only one data at a time, which leads to high communication costs. To solve this issue, this work proposes a multi-. ‧. value private information retrieval protocol using group homomorphic encryption, which. Nat. sit. y. allows users to retrieve multiple values at a time. We compared our work with that of. er. io. Ostrovsky and Skeith and show that retrieving multiple data at a time can significantly. al. iv n C proof that if the underlying group homomorphic h e n g c hencryption i U is secure, and discuss the n. reduce communication costs. Furthermore, we analyze the structure, provide a rigorous. communication complexity.. Keywords: private information retrieval, communication complexity, homomorphic encryption. iii DOI:10.6814/NCCU202001380.

(7) Contents. 誌謝. i. 摘要. ii. Abstract. 立. iii 1. ‧ 國. 學. 1 Introduction. 政治大. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.2. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.3. Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. ‧. 1.1. sit. y. Nat. n. al. 4. er. io. 2 Definitions and Preliminaries. i Un. 1. v. 2.1. Abelian Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.2. Group Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . .. engchi. 5. 2.3. Public Key Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. Ch. 3 Private Information Retrieval. 8. 3.1. Overview of Private Information Retrieval . . . . . . . . . . . . . . . . . . . . . . .. 8. 3.2. Single-Server cPIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11. 3.3. Oblivious Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11. 3.4. Relationship Between Private Information Retrieval and Oblivious Transfer . . . . .. 12. 3.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13. 4 Group Homomorphic Encryption based PIR Protocols 4.1. 16. Group Homomorphic Encryption based PIR Protocols . . . . . . . . . . . . . . . . .. 16. iv DOI:10.6814/NCCU202001380.

(8) 4.2. Example of Homomorphic Encryption based PIR Protocols . . . . . . . . . . . . . .. 17. 4.3. Two-Dimensional Group Homomorphic Encryption based PIR . . . . . . . . . . . .. 18. 4.4. Example of Two-Dimensional Group Homomorphic Encryption based PIR . . . . .. 19. 5 Our Proposed Protocols. 21. 5.1. Two-value Group Homomorphic Encryption based PIR Protocol . . . . . . . . . . .. 21. 5.2. Multi-value Group Homomorphic Encryption based PIR Protocol . . . . . . . . . .. 22. 5.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 6 Security Proof. 27. 政治大. 7 Discussion and Analysis. 7.2. Communication Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ‧. ‧ 國. 學. 7.1. 立 The Maximum Number of Retrieved Data . . . . . . . . . . . . . . . . . . . . . . .. 8 Experiment and Result. 8.2. Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32. 34. n. al. er. io. sit. y. Nat. Paillier Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Reference. 30. 34. 8.1. 9 Conclusion. 30. Ch. engchi. i Un. v. 35 39 40. v DOI:10.6814/NCCU202001380.

(9) 1. Introduction. 1.1 Background In today’s society, cloud services are beginning to attract attention due to the rapid development of data. 政治大 However, there is no guarantee that立 the service provider will be secure enough not to act maliciously networks. Cloud storage, in particular, has been more and more widely used in everyone’s daily life.. ‧ 國. 學. on the uploaded data. Therefore, the privacy and security of cloud storage has become a need for in-depth discussion, and can further be divided into data privacy and data query privacy. The main. ‧. purpose of data privacy is to ensure that service providers cannot get any information on the uploaded data. Data query privacy, on the other hand, does not care about data privacy, only to ensure that the. y. Nat. io. sit. service provider cannot know what data the user has downloaded. However, the current study mainly. n. al. er. focuses on how to use existing cryptographic architecture as a building block to protecting the data. i Un. v. privacy, such as [35, 46, 29], but less involved in how to protect the privacy of data queries.. Ch. engchi. In fact, the most trivial way to protect the privacy of data queries is to download an entire copy of the database. However, this approach is too inefficient. To solve the above issue, in 1995, Chor et al. [12] proposed a secure protocol called private information retrieval (PIR), which allows users to retrieve data from a server database without letting the server obtain any information of the retrieved data . In the work, the authors first show that, to achieve information-theoretic privacy, the communication cost under a n-bit database setting is at least n bits. To keep the communication cost less than n bits, they use multiple non-interactive databases to construct the protocol so that the communication cost is not higher than downloading the entire database. Private information retrieval (PIR), the protocol allows the user to retrieve the i-th bit of n bits database without revealing the value of i to the database server. A simple solution is a user downloads. 1 DOI:10.6814/NCCU202001380.

(10) the entire database, but this method may incur huge communication costs, so it is not an efficient way to achieve this purpose. A good PIR protocol does not only achieve the goal above but also has a low communication cost. In 1995, Chor et al. [12] provided a useful solution to protect the privacy of data queries. However, using multiple databases is not efficient enough. In 1997, Kushilevitz and Ostrovsky [19] proposed a PIR with a single database setting, called “computational private information retrieval”(CPIR), so that data do not need to be copied to other databases as [12]. More specifically, CPIR is a three-step interaction between a user and a server. The user first generates a query to retrieve Q and sends the query to the server. After the server receives the query, it calculates a return value R using the data. 政治大. in the database and Q, and then returns R to the user. Finally, the user can calculate the data he/she. 立. wants from R and without revealing it to the server.. ‧ 國. 學. Since the pioneering work of [12] and [19], PIR has been extensively studied (especially the single database setting) in the past two decades. In 2007, Ostrovsky and Skeith proposed a new PIR. ‧. protocol, using secure group homomorphic encryption method to construct a PIR protocol [40], and. sit. y. Nat. Yerukhimovich [3] further analyzed the protocol in 2015. In the same year, the first lattice-based PIR. io. er. was proposed by Aguilar-Melchor and Gaborit [9]. The security of the protocol is based on the lattice hard assumption which is called ”the differential hidden lattice problem”, in other words, the proto-. n. al. Ch. i Un. v. col is capable of resist quantum attacks. In 2010, Gertner [48] first proposed a new concept called. engchi. symmetrically-private information retrieval (SPIR) , in which data and user privacy are guaranteed. That is, each time the SPIR protocol is invoked, the user only learns one physical bit and knows nothing else about the data. More recently, in 2014, Dong and Chen [11] proposed a PIR with lower communication costs. The protocol uses tree-based compression and fully homomorphic encryption [15] as building blocks, which reduces communication costs to O(log log n). In 2019, Heidarzadeh and Anoosheh proposed an IPIR-SI [27] scheme consider a multi-user variant of PIR, allow multi-user to privately retrieve a distinct message from a server with the help of a trusted agent. Although, the protocols mentioned above do a good job of reducing communication costs, they can only transfer one data at a time. Therefore, whether a PIR suitable for simultaneous transmission of multiple values is still an unsolved question.. 2 DOI:10.6814/NCCU202001380.

(11) 1.2 Contribution In this work, inspired by [40], we first proposed a PIR protocol using group homomorphic encryption, which allows users to retrieve two values in one query and then further generalize it to an extended version where the user can retrieve the multiple values at a time. In particular, by increasing the database dimension, our protocol can further increase the amount of retrieved data without increasing communication costs too much. Compare with the original protocol of [40], our proposed protocol can reduce the communication cost by a factor of r when users attempt to retrieve r values. In addition, we have provided a rigorous proof of our protocol that if the underlying group homomorphic encryption. 政治大. is secure enough (IND-CPA secure), no attacker can know what data the user retrieved.. 立. ‧ 國. 學. 1.3 Organization. The remainder of this thesis is organized as follows. Section 2 introduced preliminaries knowledge:. ‧. abelian group, group homomorphic encryption, and secure public key encryption. Section 3 intro-. sit. y. Nat. duced the overview of PIR, including the development of PIR, the different kind of PIR, we also. al. er. io. introduced the oblivious transfer and the relationship between PIR. In Section 4, we introduced the. iv n C In Section h e n 6,g cwehgave i Uthe rigorous proofs to show that our. n. group based PIR and the two-dimensional PIR protocols and gave both an example. Section 5 proposed the main idea of our protocol.. protocol is secure. In section 7, we discussed the limitation of the maximum number of data that can be retrieved at a time and then analyze the communication cost of our protocol. Section 8 provided a experiment and the result. Section 9 concluded this thesis.. 3 DOI:10.6814/NCCU202001380.

(12) 2. Definitions and Preliminaries. For simplicity and readability, we use the notations shown in Table 2.1 throughout the paper. Table 2.1: Notations. Description The user who wants to query data The server (database) who stores the data The maximum value stored in the server The set of integer A abelian group Abelian group used to be plaintext set Abelian group used to be ciphertext set The group operation over G The group operation over G1 The group operation over G2 The identity of G The identity of G1 The identity of G2 The order of the element p The inverse of the element p Any function f is negligible in λ where f (λ) = o(λ−c ) for every fixed constant c. 政治大. ‧. ‧ 國. 立. 學. Notation U S N Z G G1 G2 · ⊙ ∗ IDG IDG1 IDG2 ord(p) p−1 negl(λ). n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 2.1 Abelian Group Abelian group is a non-void set with a binary operation and satisfied with four properties: closed, identity, inverse, associativity and commutativity. There is a non-void set G with a binary operation · that can compute two different elements a and b to another element, we can write it as a · b. Then we can say that the G is a group under the binary operation · if G satisfy the four properties: 1. Closed: ∀a, b ∈ G, a · b ∈ G. 4 DOI:10.6814/NCCU202001380.

(13) For all element a and b in G, a · b is still in G. 2. Identity: There exists an element in G called IDG ,∀a ∈ G, IDG · a = a · IDG = a. Every element in G do the operation · to IDG will still be themselves. 3. Inverse: ∀a ∈ G, ∃b = a−1 , a · b = b · a = IDG . Every a in G, exist a b is the inverse element of a as a−1 , such that a · b = b · a will be the identity element. 4. Associativity: ∀a, b, c ∈ G, (a · b) · c = a · (b · c). 政治大. For all element a, b and c in G, (a · b) · c will equal to a · (b · c).. 立. For all element a, b in G, a · b will equal to b · a.. 學. ‧ 國. 5. Commutativity: ∀a, b ∈ G, a · b = b · a. ‧. If G satisfied the above four properties, then we can call G as a abelian group.. y. Nat. er. io. sit. 2.2 Group Homomorphic Encryption. al. n. iv n C h e nongthec h it can calculate the specific algebraic operations ciphertext. i U The decrypted result is the same Homomorphic encryption is an encryption algorithm with homomorphic properties [38, 41]. That is,. as some algebraic manipulation of the plaintext. Besides, we further recall the properties of the group homomorphic encryption in [3]. Definition 1. Let G1 , G2 be two abelian groups and pk be the user’s public key. We say that a public key encryption PKE = (KeyGen, Enc, Dec) is a group homomorphic encryption if the following equation is satisfied.. Dec(Enc(m1 , pk) ⊙ Enc(m2 , pk)) = m1 ∗ m2 where G1 is the plaintext set and G2 is the ciphertext set. In addition, ∗ is the group operation over G1 and ⊙ is the group operation over G2 . 5 DOI:10.6814/NCCU202001380.

(14) For brevity, we omit the pk and sk from the encryption and decryption algorithms in the following sections.. 2.3 Public Key Encryption In this section, we recall the definition and security requirements of the public key encryption in [26]. A secure public key encryption PKE consists of the following three probabilistic polynomial time (PPT) algorithms: • KeyGen(1λ ) → (pk, sk): Key generation algorithm takes in the security parameter λ and. 政治大. outputs a public/private key pair (pk, sk).. 立. • Enc(m, pk): Encryption algorithm takes a plaintext message m and a public key pk as input,. ‧ 國. 學. and output a ciphertext ct.. ‧. • Dec(ct, sk): Decryption algorithm takes a ciphertext ct and a private key sk as input, and output a plaintext message m.. sit. y. Nat. al. er. io. Definition 2 (Correctness). We say that a public key encryption is correct if for any (pk, sk) ←. n. KeyGen(1λ ) and message m, we have. Ch. engchi. i Un. v. P r[Dec(Enc(m, pk), sk) = m] = 1. The security notation indistinguishability under chosen plaintext attack (IND-CPA) of the public key encryption is defined by the following game played between an adversary A and a challenger C.. Security Game: IND-CPA • KeyGen: The challenger runs (pk, sk) ← KeyGen(1λ ). Then the challenger sends pk to the adverary, and keeps sk secretly. • Query: In this phase, the adversary can adaptively ask the challenger for the cipthertext for any message he/she chooses. 6 DOI:10.6814/NCCU202001380.

(15) • Challenge: The adversary chooses two message of the same length m0 , m1 and sends them to the challenger. Then, the challenger chooses a random bit b ∈ {0, 1}, and generates a ciphertext ct∗ ← Enc(pk, mb ). Finally, the challenger sends ct∗ to the adversary. • Guess: The adversary returns a bit b∗ . If b∗ = b, we say the adversary wins the game. Definition 3. We say that a public key encryption is secure if every PPT adversary wins the above game with only a negligible advantage.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 7 DOI:10.6814/NCCU202001380.

(16) 3. Private Information Retrieval. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Figure 3.1: Three steps of private information retrieval. 3.1 Overview of Private Information Retrieval Nowadays, cloud services are more and more important in development of data networks. Cloud storage. However, We don’t know whether the service provider will not act maliciously on the uploaded data or not. Therefore, the privacy and security of cloud storage has become a big issue to discuss, and it can be divided into two parts, data privacy and data query privacy. PIR is focus on data query. 8 DOI:10.6814/NCCU202001380.

(17) privacy, to ensure that the service provider cannot know which data the user has retrieved. In 1995, Chor et al.[12] proposed a secure protocol called private information retrieval (PIR), which allows users to retrieve data from a server database without letting the server obtain any information of the retrieved data. PIR protocol allows a user to retrieve the i-th bit of an n-bit database, without revealing to the database server the value of i. A good PIR protocol is expected to have a considerably lower communication complexity. The authors first show that to achieve informationtheoretic privacy under several copied n-bit database setting, the communication cost is at least n bits. They use multiple non-interactive databases to construct the protocol to keep the communication cost less than n bits, so the communication cost is more efficient than downloading the entire database.. 政治大. But in this PIR protocol, multiple copied databases are required. The more copies databases, we will. 立. have to consider more privacy issue, and hardware costs have to be care.. ‧ 國. 學. Although Choret al.[12] provides a great solution to protect the privacy of data queries, but multiple copied databases exists many problems as above. In 1997, Kushilevitz and Ostrovsky first pro-. ‧. posed a PIR protocol with a single database setting [19], called “computational private information. sit. y. Nat. retrieval”(cPIR), data would not need to be copied to several databases as [12] did. cPIR is a three-step. io. er. interaction between a user and a server. The user first generates a query value of Q, which is made by the index of the data that the user wants to retrieve back. And sends it to the server. After the server. n. al. Ch. i Un. v. receives the query value, it calculates the return value R using the entire data in the database and Q,. engchi. and then returns the result R to the user. Finally, the user can calculate the data he/she wants from R and not revealing the information of what he/she wants to retrieve back to the server. Since the pioneering work of [12] and [19], more and more researches talk about this area, this topic is getting more and more attention. PIR protocols can be classified into two big categories : 1. information-theoretical private information retrieval 2. computational private information retrieval Information-theoretical PIR is using multi-server to achieve information-theoretic, such as [4, 13, 17]. cPIR is using single-server and cryptographic algorithm to achieve that only single server with some computation, such as [7, 10, 25, 28, 30, 31]. There are also some PIR protocols are based on 9 DOI:10.6814/NCCU202001380.

(18) the trusted hardware, which assumes that the hardware is in the server to respond to the user’s query without revealing to the server any query information, such as [32, 43, 47]. In cPIR area, it also has some different kinds of cPIR, we are going to discuss it later. In 2010, Gertner [48] first proposed a new concept, called symmetrically-private information retrieval (sPIR), in which data and user privacy are guaranteed. That is, each time the sPIR protocol is invoked, the user learns only one physical bit and knows nothing else about the data. .In the same year, the first lattice-based PIR was proposed by Aguilar-Melchor and Gaborit [9]. The security of the protocol is based on the lattice hard assumption which is called “The Differential Hidden Lattice Problem”, that is, the ability to resist quantum attacks. There is also another research discuss sPIR. 政治大. based on blind quantum computing [45], their protocol can reduce not only honest user’s computa-. 立. tional burden of the communication, but the cost of the quantum hardware devices in the practical. ‧ 國. 學. implementation. More, in 2014, Dong and Chen [11] proposed a PIR with lower communication costs. The protocol uses tree-based compression and fully homomorphic encryption [15] as building. ‧. blocks, which reduces communication costs to O(loglogn). In 2016, C Aguilar-Melchor, J Barrier,. sit. y. Nat. L Fousse [1] proposed a cPIR, called “XPIR” , which is a fast cPIR implementation. In 2019, Hei-. io. er. darzadeh and Anoosheh [27] proposed an IPIR-SI scheme consider a multi-user variant of the PIR, Allow multi-user to privately retrieve a distinct message from a server with the help of a trusted agent.. n. al. Ch. i Un. v. There is a topic in cPIR called “Homomorphic Encryption-Based Private Information Retrieval”.. engchi. Nowadays single-database PIR protocols provide great communication cost but require the database to use an enormous amount of computational power. The lattice-based PIR which proposed by AguilarMelchor and Gaborit [9] has the computational cost a few thousand bit-operations per bit in the database. And the user has to generate the query matrices is not efficient at all. In 2008, AguilarMelchor and Gaborit [2] proposed another researchto solve the problem with homomorphic encryption. Homomorphic encryption techniques are often very trivial ways to construct a variety of privacypreserving protocols. In 2009, Gentry [21, 22, 23, 24] constructed the first fully homomorphic encryption scheme using lattice-based cryptography. Fully homomorphic encryption is a scheme that allows one to compute arbitrary functions over encrypted data without the decryption key. In 2013, Yi, Kaosar, Paulet, and Bertino [49] proposed single-database PIR protocols from fully homomorphic. 10 DOI:10.6814/NCCU202001380.

(19) encryption. In 2007, Ostrovsky and Skeith [40] proposed a new method, using a security group homomorphic encryption method to construct the PIR protocol, and Yerukhimovich [3] further analyzed the protocol in 2015 . They used the only group homomorphic encryption to apply to single-server PIR, it can reduce the computation in the server and easily generate the queries by user. Which is the research that we improved. After discussing all kinds of PIR, now we are going to focus on single-server cPIR. we will discuss more single-server cPIR and the group base PIR using homomorphic encryption.. 3.2 Single-Server cPIR. 治政大X = x x x · · · x and a user with The cPIR protocol has two parties: a server with a n-bit database 立 a database index. When the user wants to get the i-th data x , where i ∈ [1, n], he/she will generate a 1 2 3. n. i. ‧ 國. 學. query and send it to the server. The server will compute all the data in the database by querying and return a special data. The user then extracts the special data to obtain the data he wants. The detailed. ‧. protocol can be divided into the following three steps:. sit. y. Nat. io. er. 1. Query Generation (QG): The user generates a string of the query value array Q to query some data in the database xi . The server cannot obtain any information on which data the user. n. al. Ch. i Un. is querying from Q. The user sends the array Q to the server.. engchi. v. 2. Response Generation (RG): After receiving the query value array Q sent by the user, the server takes the values stored in the entire database and the query value array Q as input to the specific operation and outputs the response value R. The server then sends R back to the user. 3. Response Retrieval (RR): Finally, the user can retrieve the value of xi by calculating the R.. 3.3 Oblivious Transfer In this section, we are going to introduce a concept similar to PIR, which is called “Oblivious Transfer”[50].. 11 DOI:10.6814/NCCU202001380.

(20) The first time that the Oblivious Transfer concept was presented by Rabin [39] in 1981. He extended to provide a deterministic result [18] by introducing the 1-out-of-2 oblivious transfer which can write as OT 21 . These two protocols were proven to be the same basically [14]. And then OT 21 can simply extend to 1-out-of-n oblivious transfer OT n1 [5], which was extended to [44] that the main contribution is a zero-knowledge proof. The oblivious transfer is an interactive protocol between Alice and Bob. First, we focus on the most simple case of the 1-out-of-2 oblivious transfer OT 21 . In this case, Bob has two bits b0 , b1 . Alice has a selection bit b ∈ [0, 1]. At the conclusion of the protocol Alice should learn bc and nothing about the bc′ where c′ = 1 − c. In the other side, Bob should learn absolutely nothing at all in the end of the. 政治大. protocol. This means that Alice can get the only information about which one she wants, and Bob can. 立. not know which one Alice has selected in b0 , b1 .. ‧ 國. 學. And now we expend the 1-out-of-2 oblivious transfer protocol OT 21 to the 1-out-of-n oblivious transfer OT n1 protocol. where Bob has n bit strings b0 , b1 , · · · , bn and Alice can learn xi where 0 ≤. ‧. i ≤ n, xi is the index chosen by Alice. As above, Alice should learn bi and nothing about other. sit. y. Nat. information. In the other side, Bob should learn absolutely nothing at all in the end at the protocol. Alice can get the only information about which one she wants, and Bob can not know which one. io. n. al. er. Alice has selected in b0 , b1 , · · · , bn . This general construction is known as a 1-out-of-n scheme of. Ch. i Un. v. OT n1 . Obviously, we can achieve a k-out-of-n protocol OT nk by invoking a protocol OT n1 in k times.. engchi. An Oblivious Transfer protocol is defined as a two-phase: “Initialization Phase” and “Transfer Phase”. In the initialization phase, The key commitment transfers from the server to the user, which takes O(n) works, n is how many elements in the server-side. In the transfer phase, these commitments can be trust to make each query.. 3.4 Relationship Between Private Information Retrieval and Oblivious Transfer The concept of PIR is similar to the concept of oblivious transfer, they both have an intimate relationship. The oblivious transfer can be considered as a stronger version of PIR. there is also a part of research call “Symmetric Private Information Retrieval” (sPIR) [33, 42, 20] is the same concept of Oblivious transfer. 12 DOI:10.6814/NCCU202001380.

(21) Oblivious transfer protocols can implement to PIR protocols if it has the communication-efficient implementation. And if PIR protocols limit the information that the user gets, it also can implement to an oblivious transfer protocols. Naor and Pinkas [34] have subsequently shown how to turn any PIR protocol into an oblivious transfer protocol OT n1 with one invocation of a single-database PIR protocol and logarithmic number of oblivious transfer protocol OT 21 . DiCrescenzo, Malkin, and Ostrovsky [16] have shown that any Oblivious transfer protocol can be constructed entirely based on invocations of PIR protocol. Single-server PIR is close to the notation of oblivious transfer. The most different point between two concepts is:. 政治大. 1. PIR protocols pay great attention to the communication costs, if the communication complexity. 立. goes too high, PIR protocols will be more inefficient than downloading the entire data in the. ‧ 國. 學. database. On the other hand, Oblivious transfer protocols do not care about the communication complexity requirement.. ‧. 2. Oblivious transfer protocols will provide the privacy to both user and server, which means the. y. Nat. io. sit. server can not learn which information the user retrieved and the user also can not get the infor-. n. al. er. mation in addition to what he wants to retrieve. In PIR protocols, it only provides the privacy. i Un. v. to the user side. The server can not learn which information the user has retrieved but it does. Ch. engchi. not care about that if the user can get other information from the server. Although PIR and oblivious transfer are really similar, the point that focuses on two different protocols is different. These two protocols will have a different purposes that the application will also be different.. 3.5 Applications In the application of PIR, let as picture a scenario first: We all know that the current social network is very developed. Many websites collect user behavioral data, giving you the most suitable advertising and search results. If you are a consumer using a shopping website to buy basketball shoes, suits, and basketball cards. One day you get a ticket to 13 DOI:10.6814/NCCU202001380.

(22) watch an NBA game in the first place, you think this is a rare opportunity to watch a show so close. So you decided to buy a professional camera to take pictures, and search the camera on the website. The algorithm behind the website detects that you never search for the camera before, and it just determines you are not familiar with the camera. Then it recommended a more expensive camera for you in the search results. You spent more money because of the results you search for before. The same situation will also happen in the reservation website, if you search a room in a place you never search before, the website may recommend a more expensive room for you in that place. PIR can solve the problem. PIR can also have applications [1, 6, 36] in several problem domains below:. 政治大 1. E-commerce: Just mentioned above, in E-commerce, the supplier adjusts the product’s se立 lect or even the price that can make more profit, he also can sell search records to advertising. ‧ 國. 學. companies. Using PIR, the consumers could privately retrieve the result, the user’s advertising. ‧. client could privately retrieve the advertisement based on the profiled online, all data is cached locally. Content providers that display advertisements to users can be charged in a manner that. y. Nat. er. io. sit. protects privacy so that the advertising network does not understand the interests of users. 2. Real-time stock market: For an investor in the stock market, he needs to pay attention to. n. al. Ch. i Un. v. the latest market information and price changes at any time. If the server manager can know. engchi. all the decisions that the particular stock investor does, then he can use the information of the stock investor. The investor might prefer a platform that can keep his stock information by not revealing any information about his stock. They can make decisions with more safely, and will not affect the decision because of the leaked information. 3. Biodatabase: With the advancement of technology, more and more biometric technologies are added to our lives. Whether it is DNA identification or fingerprint identification even more. Suppose there is a pharmaceutical organization that purchases specific genomic sequence information from a public DNA database [8]. They need this information to produce new medicines, it might be a trade secret. If there is a competition organization get the information of which genomic sequence information they retrieve, the competition organization would follow the clues 14 DOI:10.6814/NCCU202001380.

(23) to figure out the new medicine they want to study, and may propose new products earlier than them. PIR can easily prevent this situation. 4. Patent databases: The application process for new inventions must require inventors to search the patent database to ensure that previous patents are not of great overlaps with his invention. He wants to perform a search somehow his search terms were not kept in the query log of the patent database. The current patented database system allows curious or malicious database administrators to make inferences follow the user’s interests directly from the query log or in real-time query during execution.. 政治大. In all the applications above, they don’t want to reveal the sensitive information they query which. 立. sends to the server database to the server-side or public. In this situation, PIR can easily solve the. ‧ 國. 學. problems as well. Because of PIR pays great attention to the communication costs, so it can also solve the problem of the real-time queries privacy. PIR will be more important in a different areas.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 15 DOI:10.6814/NCCU202001380.

(24) 4. Group Homomorphic Encryption based PIR Protocols. 4.1 Group Homomorphic Encryption based PIR Protocols In this chapter, we recall the group homomorphic encryption based PIR protocols that were inductively. 政治大. introduced in [40]. Here, we let HEPKE = (KeyGen, Enc, Dec) be an IND-CPA secure group. 立. One-Dimensional Group Homomorphic Encryption based PIR. 學. ‧ 國. homomorphic encryption.. ‧. In the one-dimensional setting, S has a n-element data X = {xi }ni=1 , U wants to learn the value xi∗ ,. y. sit. io. 1. Query Generation (QG):. er. as below:. Nat. where 1 ≤ i∗ ≤ n. The one-dimensional group homomorphic encryption based PIR protocol is shown. al. n. iv n C U first chooses an element p ∈ G1 h which satisfies ord(p) e n g c h i U > N , and then generates a query n array Q = {qi }i=1 ,.    Enc(p), if i = i∗ ; where qi = ∈ G2 .  Enc(ID ), otherwise.  G1 Finally, U sends Q to S. 2. Response Generation (RG): After receiving Q, S computes R=. n ∑. xi ⊙ q i. i=1. and sends it back. 16 DOI:10.6814/NCCU202001380.

(25) 3. Response Retrieval (RR): Then, U first computes. ( Dec(R) ∗ p. −1. = Dec. n ∑. ). i=1. (. ). N ∑. xi ∗ ∗ p +. =. ∗ p−1. xi ⊙ q i. xi ∗ IDG1. ∗ p−1. i=1,i̸=i∗. = xi∗ ∗ p ∗ p−1. 政治大. = xi ∗ .. 立. ‧ 國. 學. 4.2 Example of Homomorphic Encryption based PIR Protocols In this section, we give an toy example to show how group homomorphic encryption based PIR pro-. ‧. tocol works. We assume S has a one-dimensional database that stores the following data:. n.  1     4    DB =   .  3      1. Ch. engchi. y. sit. io. al. . er. Nat. . i Un. v. Therefore, the maximum value N is 5. We then choose m is 200 and set G1 , G2 = ⟨Z200 , +⟩. In particular, the identity of G1 is 0 (i.e., IDG1 = 0). Now, suppose U wants to retrieve the value of second element. U interacts with S as follows. 1. Query Generation (QG): U first picks k = 2 ∈ G1 ,and then generates a query array Q = [Enc(0), Enc(2), Enc(0), Enc(0)]. U sends Q to S. 2. Response Generation (RG): S computes R as below. 17 DOI:10.6814/NCCU202001380.

(26) (. ) Enc(0) · 1 + Enc(2) · 4 + Enc(0) · 3 + Enc(0) · 1. R=. mod m. = (Enc(8)) mod m.. S then sends R back to U. 3. Response Retrieval (RR): U first computes Dec(R) = 8 and then obtain the data of second row by computing:. 立. 政8 × 2 治 = 4. 大 −1. ‧ 國. 學. 4.3 Two-Dimensional Group Homomorphic Encryption based PIR. ‧. The one-dimensional scheme can extend to two-dimensional scheme. In the two-dimensional setting,. y. Nat. similar with one-dimensional setting, S has a n × n-element data X = {xi,j }ni,j=1 , U wants to learn. n. al. er. io. PIR protocol is shown as below:. sit. n values {xi∗ ,j }nj=1 , where 1 ≤ i∗ ≤ n. The two-dimensional group homomorphic encryption based. 1. Query Generation (QG):. Ch. engchi. i Un. v. U first chooses an element p ∈ G1 which satisfies ord(p) > N , and then generates a query array Q = {qi }ni=1 ,    Enc(p), if i = i∗ ; where qi = ∈ G2 .   Enc(IDG1 ), otherwise. Finally, U sends Q to S. 2. Response Generation (RG):. 18 DOI:10.6814/NCCU202001380.

(27) After receiving Q, S computes. Rj =. n ∑. xi,j ⊙ qi mod m for j = 1, · · · , n. i=1. and sends them back. 3. Response Retrieval (RR): For j = 1, · · · , n, U computes. (. Dec(Rj ) ∗ p−1. 立. n. i,j. i. −1. i=1. (. N ∑. xi∗ ,j ∗ p +. ). xi,j ∗ IDG1. 學. =. ‧ 國. ). 政 ∑治 = Dec x ⊙q 大 ∗p i=1,i̸=i∗. ‧. = xi∗ ,j ∗ p ∗ p−1. ∗ p−1. = xi∗ ,j .. sit. y. Nat. n. al. er. io. Although we present the equation xi∗ ,j ∗p∗p−1 = xi∗ ,j here, in fact, how U obtains the information. i Un. v. of xi∗ depends on the property of G1 , G2 used in the scheme. If G1 , G2 is a multiplicative group, U can. Ch. engchi. compute xi∗ = logp (xi∗ ∗p). If G1 , G2 is an additive group, with the help of the generator of the group, U can easily compute xi by division. Since the additive group is more efficient than multiplicative group, in practice we use G1 , G2 = ⟨Zm , +⟩ as the implementation architecture for fast calculation of xi , where m is a large number.. 4.4 Example of Two-Dimensional Group Homomorphic Encryption based PIR In this section, we give an toy example to show how two-dimensional group homomorphic encryption based PIR protocol work. We assume S has a two-dimensional database that stores the following data:. 19 DOI:10.6814/NCCU202001380.

(28) .   1 2 3   4 2 1  DB =   3 1 2   1 4 4. 4   3   . 1    3. Therefore, the maximum value N is 5. We then choose m is 200 and set G1 , G2 = ⟨Z200 , +⟩. In particular, the identity of G1 is 0 (i.e., IDG1 = 0). Now, suppose U wants to retrieve the value of second row. U interacts with S as follows.. 政治大 U first picks k = 2 ∈ G ,and then generates a query array 立. 1. Query Generation (QG): 1. ‧ 國. 學. Q = [Enc(0), Enc(2), Enc(0), Enc(0)]. U sends Q to S. 2. Response Generation (RG):. ‧. S computes R as below.. er. io. sit. y. Nat. .  Enc(0) · 1 + Enc(2) · 4 + Enc(0) · 3 + Enc(0) · 1   Enc(0) · 2 + Enc(2) · 2 + Enc(0) · 1 + Enc(0) · 4  R=  Enc(0) · 3 + Enc(2) · 1 + Enc(0) · 2 + Enc(0) · 4   Enc(0) · 4 + Enc(2) · 3 + Enc(0) · 1 + Enc(0) · 3. n. al. Ch. engchi U. v ni. T      mod m   . = (Enc(8), Enc(4), Enc(2), Enc6)) mod m.. S then sends R back to U. 3. Response Retrieval (RR): U first computes Dec(R) = (8, 4, 2, 6) and then obtain the data of second row by computing: (8, 4, 2, 6) × 2−1 = (4, 2, 1, 3).. 20 DOI:10.6814/NCCU202001380.

(29) 5. Our Proposed Protocols. In this section, inspiring from [40], we present an improved construction of the PIR protocol using homomorphic encryption. In contrast to [40] where only one data value was retrieved at a time, our work allows multiple values to be retrieved from the repository at once. The following we first provide. 政治大. a two-value setting protocol and then extend it to a multi-value one. We note that we only provide a. 立. one-dimensional setting here, as shown in Section 4.3, we can extend our protocols to obtain a two-. ‧. ‧ 國. 5.3.. 學. dimensional setting by executing n times on the database side. Besides, we give an example in Section. sit. y. Nat. 5.1 Two-value Group Homomorphic Encryption based PIR Protocol. er. io. In this protocol, S has a n-element data X = {xi }ni=1 , U wants to retrieve two value x′1 , x′2 ∈ X from. al. iv n C k1 , k2 ∈ G1 , here we note that k1 and k2 arehused e ntogbecencrypted h i U by the homomorphic encryption to n. S, where x′1 , x′2 ∈ G1 and G1 ,G2 are an addition group ⟨Zm , +⟩. At first, U selects two secret elements. a random number, and thus if k1 and k2 are chosen properly, then the value of k1 and k2 do not affect the security. Which satisfy the following three equations:    ord(k1 ) > N, ord(k2 ) > N    k2 > k1 ∗ N      m > (k2 + k1 ) ∗ N If k1 and k2 do not satisfy the above equations, U reselects the large m and the group G1 , G2 . And because U knows the location of x′1 and x′2 , U then follows the following steps to get the information of x′1 and x′2 . 1. Query Generation (QG): 21 DOI:10.6814/NCCU202001380.

(30) User generates a query array Q = {qi }ni=1    Enc(k1 ), if xi = x′1 ;    where qi = Enc(k2 ), if xi = x′2 ; ∈ G2 .      Enc(0), otherwise. Finally, U sends Q to S. 2. Response Generation (RG): After receiving Q, S computes. n ∑. R= 治 x ⊙q 政大 i. 立. 學. ‧ 國. and sends it back.. i. i=1. 3. Response Retrieval (RR):. ‧. Then, U first computes. sit. y. Nat. Dec(R) = x′1 ∗ k1 + x′2 ∗ k2 .. n. al. er. io. U can then retrieve x′1 and x′2 as follows. (a) ∵ k2 > k1 ∗ N and x′1 ≤ N ⇒ x′1 ∗ k1 ≤ N ∗ k1 < k2. Ch. engchi. i Un. v. ∴ (x′1 ∗ k1 + x′2 ∗ k2 ) mod k2 = x′1 ∗ k1 . (b) (x′1 ∗ k1 ) ∗ k1−1 = x′1 . (c) (x′1 ∗ k1 + x′2 ∗ k2 − x′1 ∗ k1 ) = x′2 ∗ k2 . (d) (x′2 ∗ k2 ) ∗ k2−1 = x′2 .. 5.2 Multi-value Group Homomorphic Encryption based PIR Protocol Based on two-value setting, we can obtain a multi-value setting as follows. In this protocol, S has a n-element data X = {xi }ni=1 , U wants to retrieve r value x′1 , · · · , x′r ∈ X from S, where x′1 , · · · , x′r ∈ 22 DOI:10.6814/NCCU202001380.

(31) G1 , G1 ,G2 are an addition group ⟨Zm , +⟩. At first, U select r secret elements k1 , · · · , kr ∈ G1 , here we note that k1 , · · · , kr are used to be encrypted by the homomorphic encryption to a random number, and thus if k1 , · · · , kr are chosen properly, then the value of k1 , · · · , kr do not affect the security. Which satisfy the following equations:    ord(ki ) > N, where i = 1, · · · , r        k2 > k 1 ∗ N       k3 > (k2 + k1 ) ∗ N ..   .     ∑    kr > n−1  i=1 ki ∗ N    ∑   m > r ki ∗ N i=1. 政治大. 立. ‧ 國. 學. If k1 , k2 , · · · , kr do not satisfy the above equations, U reselects the large m and the groups G1 , G2 . And because U knows the location of x′1 , · · · , x′r , U then follows the following steps to get the infor-. ‧. mation of x′1 , · · · , x′r .. sit. y. Nat. 1. Query Generation (QG):. n. al.    Enc(k1 ),       Enc(k2 ),   . Ch. where qi =. er. io. User generates a query array Q = {qi }ni=1. engchi.      Enc(kr ),       Enc(0),. i Un. v. if xi = x′1 ;. if xi = x′2 ;. .. .. ∈ G2 . if xi = x′r ; otherwise.. Finally, U sends Q to S. 2. Response Generation (RG): After receiving Q, S computes R=. n ∑. xi ⊙ q i. i=1. and sends it back. 23 DOI:10.6814/NCCU202001380.

(32) 3. Response Retrieval (RR): U first computes Dec(R) = x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′r ∗ kr . U can then retrieve x′1 : (a) ∵ k2 > N ∗ k1 ∴ x′1 ∗ k1 ≤ N ∗ k1 < k2. ∵ k3 > (k2 + k1 ) ∗ N. 政治大. ∴ x′1 ∗ k1 + x′2 ∗ k2 ≤ N ∗ k1 + N ∗ k2 < N ∗ (k1 + k2 ) < k3. 立. ‧ 國. 學. .. .. ‧. ∵ kr > (kr + kr−1 + · · · + k2 + k1 ) ∗ N. sit. y. Nat. ∴ x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′r−1 ∗ kr−1. al. er. io. ≤ N ∗ k1 + N ∗ k2 + · · · + N ∗ kr−1. n. = N ∗ (k1 + k2 + · · · + kr−1 ) < kr. Ch. engchi. i Un. v. (b) (x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′r ∗ kr ) mod kr = (x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′r−1 ∗ kr−1 ). (x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′k−1 ∗ kr−1 ) mod kr−1 = (x′1 ∗ k1 + x′2 ∗ k2 + · · · + x′k−2 ∗ kr−2 ). .. . (x′1 ∗ k1 + x′2 ∗ k2 ) mod k2 = x′1 ∗ k1 (c) (x′1 ∗ k1 ) ∗ k1−1 = x′1 . 24 DOI:10.6814/NCCU202001380.

(33) After obtain x′1 , U retrieves x′i one by one, where i = 2, · · · , r: (a) (Dec(R) −. ∑i−1 j=1. x′j ∗ kj ) mod (kr , kr−1 , · · · , ki + 1) = x′i ∗ ki .. (b) (x′i ∗ ki ) ∗ ki−1 = x′i .. 5.3 Example In this section, we give an toy example to show how our proposed protocol work. We assume S has a two-dimensional database that stores the following data: . . 治政  1 2 3 4   大  4 2 1 3 . 立DB = . ‧ 國. 學.  .  3 1 2 1      1 4 4 3. ‧. Therefore, the maximum value N is 5. We then choose m is 200 and set G1 , G2 = ⟨Z200 , +⟩. In. y. Nat. particular, the identity of G1 is 0 (i.e., IDG1 = 0). Now, suppose U wants to retrieve the data of. n. al. er. io 1. Query Generation (QG):. sit. second and third rows. U interacts with S as follows.. Ch. engchi. i Un. v. U first picks k1 = 2, k2 = 11 ∈ G1 , and then generates a query array Q = [Enc(0), Enc(2), Enc(11), Enc(0)]. U sends Q to S. 2. Response Generation (RG): S computes R as below.. 25 DOI:10.6814/NCCU202001380.

(34) .  Enc(0) · 1 + Enc(2) · 4 + Enc(11) · 3 + Enc(0) · 1   Enc(0) · 2 + Enc(2) · 2 + Enc(11) · 1 + Enc(0) · 4  R=  Enc(0) · 3 + Enc(2) · 1 + Enc(11) · 2 + Enc(0) · 4   Enc(0) · 4 + Enc(2) · 3 + Enc(11) · 1 + Enc(0) · 3. T      mod m   . = (Enc(41), Enc(15), Enc(24), Enc(17)) mod m.. S then sends R back to U. 3. Response Retrieval (RR):. 政治大. 立. U first computes Dec(R) = (41, 15, 24, 17) and then obtain the data of second and third row. ‧ 國. 學. by computing the following steps.. ‧. • Second row:. sit. y. Nat. (a) (41, 15, 24, 17) mod 11 = (8, 4, 2, 6).. io. al. n. • Third row:. er. (b) (8, 4, 2, 6) × 2−1 = (4, 2, 1, 3).. Ch. engchi. i Un. v. (a) (41, 15, 24, 17) − (4 · 2, 2 · 2, 1 · 2, 3 · 2) = (33, 11, 22, 11). (b) (33, 11, 22, 11) × 11−1 = (3, 1, 2, 1).. 26 DOI:10.6814/NCCU202001380.

(35) 6. Security Proof. We now provide a rigorous proof to show that our proposed protocol is secure, that is, even if an adversary has ability to access the transmitted data, he/she cannot obtain any information of what U wants to the retrieve. To be more precise, follow the idea of [3], we show that if there is a PPT. 政治大. algorithm A who can distinguish whether some q ∈ G2 is an encryption of IDG1 or an encryption. 立. of p ∈ G1 with a non-negligible advantage, then there is another algorithm C can win the game of. ‧ 國. 學. IND-CPA security game of the group homomorphic encryption through A. Here, we first introduce some useful notations. Let (KeyGen, Enc, Dec) be an IND-CPA secure. ‧. group homomorphic encryption. Then, Q0 denotes {qk = Enc(IDG1 )}nk=1 , and Qi∈[1,n] denotes {qk =. y. Nat. Enc(IDG1 )}nk=1 except that qi = Enc(p), where p ∈ G1 . Finally, P r[A(Qi ) = 1] denotes the. er. io. encryption of IDG1 .. sit. probability that the PPT algorithm A can distinguish between qi ∈ Qi is an encryption of p or a. al. n. iv n C h e n gencrypiton Theorem 1. If the underlying group homomorphic c h i Uis IND-CPA secure, then there is no. PPT algorithm A who can obtain the information that user retrieved in our propose protocol with non-negligible probability.. Proof. In order to prove this theorem, we first need to show Lemma 1 and 2 are correct. Here, λ is the security parameter of the underlying group homomorphic encryption. Lemma 1. If there is an PPT algorithm A such that P r[A(Qi ) = 1] − P r[A(Q0 ) = 1] = ϵ > negl(λ), then there is another PPT algorithm C can win the IND-CPA game of the underlying group homomorphic encryption with non-negligible probability 1/2 + ϵ/2. Proof. The following we show that C uses A as a block-box algorithm to win the IND-CPA game of the underlying group homomorphic encryption with non-negligible probability. 27 DOI:10.6814/NCCU202001380.

(36) C first computes n − 1 values {qk = Enc(IDG1 )}nk=1,k̸=i using the public key pk generated by KeyGen(1λ ). Then, he/she sends m0 = IDG1 and m1 = p as the challenge messages to the IND-CPA game, and receives c ← Enc(mb ) where b ∈ {0, 1} is randomly chosen. Finally, he/she sets qi = c, and then outputs A(Q = {qk }nk=1 ) as the result of the guess. Here, if b = 0, the distribution of Q is the same as the distribution of Q0 ; If b = 1, the distribution of Q is the same as the distribution of Qi . Therefore, the probability of C winning the IND-CPA game AdvCIN D−CP A can be expressed as:. 政治大. P r[A(Q) = 0|b = 0] · P r[b = 0]. 立. +P r[A(Q) = 1|b = 1] · P r[b = 1]. ‧ 國. 學. = (1/2)(1 − P r[A(Q0 ) = 1] +(1/2)(1 − P r[A(Qi ) = 1]. Nat. er. io. sit. y. ‧. = 1/2 + ϵ/2.. al. iv n C U h eC ncangwin negl(λ), then there is another PPT algorithm game of the underlying group c hthei IND-CPA n. Lemma 2. If there is a PPT algorithm A such that P r[A(Qj ) = 1] − P r[A(Q0 ) = 1] = ϵ ≥. homomorphic encryption with non-negligible probability ϵ/2. Proof. The proof of this Lemma is the same as of Lemma 1. Therefore, we omit it here. Since the underlying group homomorphic encryption is IND-CPA secure, P r[A(Qj ) = 1] − P r[A(Q0 ) = 1] ≤ negl(λ) and P r[A(Qj ) = 1] − P r[A(Q0 ) = 1] ≤ negl(λ). Furthermore, we get the following equation:. 28 DOI:10.6814/NCCU202001380.

(37) |P r[A(Qi ) = 1] − P r[A(Qj ) = 1]| = |(P r[A(Qi ) = 1] − P r[A(Q0 ) = 1]) −(P r[A(Qj ) = 1] − P r[A(Q0 ) = 1])| = |negl(λ) − negl(λ)| = negl(λ). Thus, if the underlying group homomorphic encrypitonis IND-CPA secure, then there is no PPT algorithm A who can obtain the information that user retrieved with non-negligible probability.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 29 DOI:10.6814/NCCU202001380.

(38) 7. Discussion and Analysis. In this section, we first discuss the maximum number of data that can be retrieved at a time (i.e., the maximum value of r), and how to increase it. Then, we analyze the communication cost of our proposed protocols.. 政治大. For more precise analysis, we let G1 , G2 be the same additive group, i.e., ⟨Zm , +⟩ ,where m is a. 立. large number.. ‧ 國. 學. ‧. Table 7.1: The communication cost of retrieving two- and multi-value compared to [40] when the database is one- and two-dimensional. Here, we assume that the operators of the two protocols is based on the additive group, i.e., G1 , G2 = ⟨Zm , +⟩, where m is a large number. In addition, we let n represent the number of data in an array, and r represent the number of values retrieved. One-dimensional Two-dimensional Protocol Two-value Multi-value Two-value Multi-value √ √ [40] 2(n + 1) log m r(n + 1) log m 4 n log m 2r n log m √ √ Our’s (n + 1) log m (n + 1) log m 2 n log m 2 n log m. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 7.1 The Maximum Number of Retrieved Data. In Section 5, we proposed two-value and multi-value setting PIR protocols, which allows users to retrieve two and multiple values at a time, respectively, thus greatly improving efficiency compared to [40]. In the implementation, however, the maximum number of data is limited by the encryption architecture we adopted. Here, we illustrate the limitations of r through our multi-setting PIR protocol. As the same as Section 5.2, U first selects r values k1 , k2 , · · · , kr ∈ G1 which satisfy the following equations:. 30 DOI:10.6814/NCCU202001380.

(39)    ord(ki ) > N, where i = 1, · · · , r        k2 > k 1 ∗ N       k3 > (k2 + k1 ) ∗ N ..   .     ∑    kr > n−1  i=1 ki ∗ N    ∑   m > r ki ∗ N i=1 ∑r. Because m has to be greater than to. i=1. values of k1 , · · · , kr as small as possible.. ki ∗ N , in order to maximum the value r, we need to set the. 政治大.    k2 ∼ = k1 ∗ N       k3 > (k2 + k1 ) ∗ N ⇒ k3 ∼ = k1 ∗ N 2 ..   .       k > ∑n−1 k ∗ N ⇒ k ∼ r−1 r = k1 ∗ N r i=1 i. 立. y. ki ∗ N will approximate to:. io. ∑r. ki ∗ N ∑ ⇒ m > k1 ∗ ri=1 N i k ∗ N (N r − 1) ⇒m> 1 N −1 m>. n. al. Ch. sit. Nat. i=1. i=1. engchi. er. ∑r. ‧. ‧ 國. 學. Therefore, the equation m >. i Un. v. Assume that the group homomorphic encryption we adopted is 1024-bit, the maximum value of m is 21024 (If m is larger than 21024 , group homomorphic encryption will not work properly). In order to retrieve the maximum number of data (i.e., to maximize r), we have to minimize the value of parameters (e.g., k1 , N ). For instance, let k1 be 2 and the value stored in database is binary, which means the value of N is 2. We can obtain that the maximum value that can be retrieved at a time is r = 1022 in the above environment. Here, we note that although increase m can directly increase the maximum number of r (for the same N ), we need to set up group homomorphic encryption at higher bits (e.g., 2048-bit), and the setting will result in increased execution time for encryption and decryption.. 31 DOI:10.6814/NCCU202001380.

(40) 7.2 Communication Cost In this section, we analyze the communication cost of our proposed protocol and [40] when U wants to retrieve r values. Here, we consider two scenarios, namely, whether the database is a one-dimensional setting or a two-dimensional setting. In one-dimensional setting, in our proposed protocol, U has to transfer a query array Q = {qi }ni=1 to S, where qi ∈ Zm . Thus, the communication cost of U is n log m. As for S, who transfers R = ∑n i=1 xi · qi mod m, the communication cost of S is n log m. Therefore, the total cost is (n + 1) log m. On the other hand, in the work of [40], since only one value can be retrieved at a time, U and S must. 政治大. to execute the protocol r times in order to retrieve r values. Therefore, the communication cost is. 立. r(n + 1) log m.. ‧ 國. 學. In two-dimensional setting, U works the same as in one-dimensional setting, regardless of our proposed protocol or [40]. However, S must transfer R1 , · · · , Rn ∈ Zm in this setting. Therefore, the. ‧. communication cost of our proposed protocol and [40] is 2n log m and 2rn log m, respectively. As shown in Table 7.1, our proposed protocol can reduce the communication cost by a factor of r when. y. Nat. io. sit. U attempts to retrieve r values in either one- or two-dimensional database.. n. al. er. As mentioned in the section 7.1, if the settings for group homomorphic encryption are fixed, how. i Un. v. many data can be retrieved at a time (i.e., r) depends on the maximum values stored in the database. Ch. engchi. (i.e., N ) in our protocol. The relationship between N and r is shown in Table 7.2. Table 7.2: The relation between how many data can be retrieved at a time (i.e., r) and maximum values stored in the database (i.e., N ). N 2 25 210 250 2100 2500 r 1022 204.6 92.9 20.5 10.2 2.0 Although in [40], the communication cost is fixed regardless of the maximum values stored in the database, in a real scenario, the maxi value stored in the database would not be as large as 250 . That is, even if our protocol is limited by the maximum values we stored, we can significantly reduce communication costs compared to [40]. Moreover, if the user still wants to retrieve the number of data more than r, he just has to simply increase more times to retrieve. assume that user wants to retrieve t data, and t > r, then user only have to execute the protocol ⌈ rt ⌉ times. Which means that we can 32 DOI:10.6814/NCCU202001380.

(41) reduce a great communication cost in our protocol.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 33 DOI:10.6814/NCCU202001380.

(42) 8. Experiment and Result. In this chapter, we presented an experiment of our protocol. The experiments were conducted with Intel Core(TM) i5-8400 CPU @2.80GHz, 16GB of DDR4 RAM, and NVIDIA GeForce GTX 1060M 6GB DDR5 GPU. We used Python 2.7 to construct our protocol and [40] protocol. To make a fair. 政治大. comparison, both we and [40] architecture use Paillier group homomorphic encryption mechanism as. 立. 學. ‧ 國. a component.. 8.1 Paillier Cryptosystem. ‧. Paillier Cryptosystem [37], proposed in 1999, is a public key cryptography with homomorphic prop-. Nat. sit. al. n. • KeyGen(1λ ) → (pk, sk):. er. io. and Dec(c, sk).. y. erty. The cryptosystem consists of three algorithms described as follows: KeyGen(1λ ), Enc(m, pk),. Ch. engchi. i Un. v. 1. Randomly choose two large prime numbers p and q and independently of each other such that gcd(pq, (p − 1)(q − 1)) = 1, gcd means Greatest Common Divisor. 2. Compute n = pq and λ = lcm(p − 1, q − 1), lcm means Least Common Multiple. 3. Select random integer g where g ∈ Z∗n2 . 4. Ensure n divides the order of g by checking the existence of the following modular multiplicative inverse: µ = (L(g λ mod n2 ))−1 mod n. 5. The public key pk = (n, g). The private key sk = (λ, µ). • Enc(m, pk): 1. Let m be a message to be encrypted where 0 ≤ m < n. 34 DOI:10.6814/NCCU202001380.

(43) 2. Select random r where 0 < r < n and r ∈ Z∗n (i.e., ensure gcd(r, n) = 1). 3. Compute ciphertext as: c = g m · rn mod n2 . • Dec(c, sk): 1. Let c be the ciphertext to decrypt, where c ∈ Z∗n2 . 2. Compute the plaintext message as: m = L(cλ mod n2 ) · µ mod n. Paillier Cryptosystem is an additive homomorphic encryption, which means given only the public key and the c1 and c2 that was encrypted from m1 and m2 , we can still compute the value of m1 + m2 as below :. 立. 政治大. D(E(m1 , r1 ) · E(m2 , r2 ) mod n2 ) = m1 + m2 mod n.. ‧. ‧ 國. 學. 8.2 Result. We used four different key sizes of paillier cryptosystem to construct our and [40] protocols as 32-bit,. y. Nat. io. sit. 128-bit, 512-bit, and 1024-bit. The following we compared the execution time between our protocol. n. al. er. and [40] protocol, the result are shown in Fig. 8.1 to Fig. 8.4. The blue line ”our” represents our protocol, and the red line ”OS07” represents [40] protocol.. Ch. engchi. i Un. v. 35 DOI:10.6814/NCCU202001380.

(44) 立. 政治大. ‧. ‧ 國. 學. Figure 8.1: Execution time comparison between ours and [40] under 32-bit setting. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Figure 8.2: Execution time comparison between ours and [40] under 128-bit setting. 36 DOI:10.6814/NCCU202001380.

(45) 立. 政治大. ‧. ‧ 國. 學. Figure 8.3: Execution time comparison between ours and [40] under 512-bit setting. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Figure 8.4: Execution time comparison between ours and [40] under 1024-bit setting The more data retrieved one time of execution time in [40] protocol is much longer than our protocol. As the numbers of data increases, the execution time will also increase in [40] protocol. On the other hand, there is no obviously increases of execution time in our protocol. We significantly 37 DOI:10.6814/NCCU202001380.

(46) reduced the communication cost and the execution time in multi-value retrieved one time. Experiments show that the execution time of our protocol is less than [40] regardless of setting 32 bits, 128 bits, 512 bits, or 1024 bits. The main reason is that [40] cannot retrieve multiple data at the same time, but our proposed protocol can transmit multiple data at the same time. When the amount of data retrieved at one time increases, the time taken by the [40] protocol will increase linearly with the amount of data. On the other hand, our protocol can approach a constant time. Therefore, as the amount of data required increases, our protocol can significantly reduce the amount of time spent.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 38 DOI:10.6814/NCCU202001380.

(47) 9. Conclusion. Nowadays, the demand for relying on cloud services will grow rapidly, making it increasingly important for users to protect their privacy when collecting data. PIR plays an important role here because it prevents cloud services from accessing sensitive information. In this paper, a multi-value PIR pro-. 政治大. tocol using group homomorphic encryption is proposed. In particular, compared to previous work,. 立. our protocol allows users to retrieve multiple values at once, thus significantly reducing communi-. ‧ 國. 學. cation costs. In addition, the security of our protocol is guaranteed based on IND-CPA secure group homomorphic encryption. In our future works, we will further explore how to reduce the limit on the. ‧. amount of retrieved data and to reduce the communication cost, so as to obtain more valuable and. n. al. er. io. sit. y. Nat. more efficient solutions.. Ch. engchi. i Un. v. 39 DOI:10.6814/NCCU202001380.

(48) Reference. [1] C. Aguilar-Melchor, J. Barrier, L. Fousse, , and M.-O. Killijian. XPIR : Private Information Retrieval for Everyone. Proceedings on Privacy Enhancing Technologies, (2):155–174, 2016.. 治政大pages 138–154, 2008. operand Multiplications. In Annual Cryptology Conference, 立. [2] C. Aguilar-Melchor, P. Gaborit, and J. Herranz. Additively Homomorphic Encryption with D-. ‧ 國. 學. [3] Y. Arkady. A General Framework for One Database Private Information Retrieval. Online at http://www.cs.umd.edu/Grad/scholarlypapers/papers/Arkady-pircomp.pdf, 2015.. ‧. [4] A. Beimel and Y. Ishai. Information-theoretic Private Information Retrieval: A Unified Con-. Nat. sit. n. al. er. io. 926, 2001.. y. struction. In International Colloquium on Automata, Languages, and Programming, pages 912–. i Un. v. [5] G. Brassard, C. Crepeau, and J. Robert. All-or-nothing Disclosure of Secrets. In Conference on. Ch. engchi. the Theory and Application of Cryptographic Techniques, 1986. [6] J. Bringer, H. Chabanne1, D. Pointcheval, and Q. Tang. Extended Private Information Retrieval and Its Application in Biometrics Authentications. In International Conference on Cryptology and Network Security, pages 175–193, 2007. [7] C. Cachin, S. Micali, and M. Stadler. Computationally Private Information Retrieval with Polylogarithmic Communication. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 402–414, 1999. [8] J. Camenisch, M. Dubovitskaya, and G. Neven. Unlinkable Priced Oblivious Transfer with Rechargeable Wallets. Proceedings of FC 2010, January 2010.. 40 DOI:10.6814/NCCU202001380.

(49) [9] A.-M. Carlos and G. Philippe. A Lattice-based Computationally-efficient Private Information Retrieval Protocol. Cryptol. ePrint Arch., Report, page 446, 2007. [10] Y. Chang. Single Database Private Information Retrieval with Logarithmic Communication. In Australasian Conference on Information Security and Privacy, pages 50–61, 2004. [11] D. Changyu and L. Chen. A Fast Single Server Private Information Retrieval Protocol with Low Communication Cost. In M. Kutyłowski and J. Vaidya, editors, Computer Security - ESORICS 2014, pages 380–399, Cham, 2014. Springer International Publishing. [12] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private Information Retrieval. In Pro-. 政治大. ceedings of IEEE 36th Annual Symposium on Foundations of Computer Science, pages 41–50,. 立. 1995.. ‧ 國. 學. [13] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan. Private Information Retrieval. J.ACM,. ‧. vol. 45, no. 6, pages 965–981, 1998.. sit. y. Nat. [14] C.-K. Chu and W.-G. Tzeng. Efficient K-out-of-n Oblivious Transfer Schemes. Journal of. io. er. Universal Computer Science, 2008.. al. iv n C First Annual ACM Symposium on Theory STOC ’09, pages 169–178, New York, h eofnComputing, gchi U n. [15] G. Craig. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the Forty-. NY, USA, 2009. Association for Computing Machinery. [16] G. D. Crescenzo, T. Malkin, and R. Ostrovsky. Single Database Private Information Retrieval Implies Oblivious Transfer. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 122–138, 2000. [17] C. Devet, I. Goldberg, and N. Heninger. Optimally Robust Private Information Retrieval. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pages 269– 283, 2012. [18] S. Even, O. Goldreich, and A. Lempel. A Randomized Protocol for Signing Contracts. Communications of the ACM, 1985. 41 DOI:10.6814/NCCU202001380.

(50) [19] K. Eyal and O. Rafail. Replication is Not Needed: Single Database, Computationally-private Information Retrieval. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 364–373. IEEE, 1997. [20] Y. Gentner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting Data Privacy in Private Information Retrieval Schemes. Journal of Computer and System Sciences, 60(3), pages 592–629, 2000. [21] C. Gentry. Fully Homomorphic Encryption Scheme. PhD Thesis, Stanford University,, 2009. [22] C. Gentry. Fully Homomorphic Encryption using Ideal Lattices. Proc. STOC ’09, pages 169– 178, 2009.. 政治大. 立. [23] C. Gentry. Computing Arbitrary Functions of Encrypted Data. Communications of the ACM,. ‧ 國. 學. pages 97–105, 2010.. ‧. [24] C. Gentry. Toward Basing Fully Homomorphic Encryption on Worst-case Hardness. Proc.. sit. y. Nat. CRYPTO ’10, pages 116–137, 2010.. n. al. er. io. [25] C. Gentry and Z. R. S. Single Database Private Information Retrieval with Constant Commu-. i Un. v. nication Rate. In International Colloquium on Automata, Languages, and Programming, pages 803–815, 2005.. Ch. engchi. [26] O. Goldreich. Foundations of Cryptography: volume 1, Basic Tools. Cambridge University Press, 2007. [27] A. Heidarzadeh, S. Kadhe, S. El Rouayheb, and A. Sprintson. Single-server Multi-Message Individually-Private Information Retrieval with Side Information. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 1042–1046, 2019. [28] E. Kushilevitz and R. Ostrovsky. One-way Trapdoor Permutations are Sufficient for Non-trivial Single-server Private Information Retrieval. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 104–121, 2000.. 42 DOI:10.6814/NCCU202001380.

(51) [29] W. Lifei, Z. Haojin, C. Zhenfu, D. Xiaolei, W. Jia, C. Yunlu, and V. A. V. Security and Privacy for Storage and Computation in Cloud Computing. Information Sciences, pages 371–386, 2014. [30] H. Lipmaa. An Oblivious Transfer Protocol with Log-squared Communication. In International Conference on Information Security, pages 314–328, 2005. [31] H. Lipmaa. First CPIR Protocol with Data-dependent Computation. In International Conference on Information Security and Cryptology, pages 193–210, 2009. [32] A. Lliev and S. W. Smith. Protecting Client Privacy with Trusted Computing at the Server. In IEEE Security & Privacy, pages 20–28, 2005.. 政治大 [33] M. Naor and B. Pinkas. Oblivious Transfer and Polynomial Evaluation. In Proceedings of the 立 thirty-first annual ACM symposium on Theory of computing, pages 245–254, 1999.. ‧ 國. 學. [34] M. Naor and B. Pinkas. Oblivious Transfer with Adaptive Queries. In Annual International. ‧. Cryptology Conference, pages 573–590, 1999.. y. Nat. sit. [35] K. Nesrine and L. Maryline. Data Security and Privacy Preservation in Cloud Storage Envi-. al. n. 2017.. er. io. ronments based on Cryptographic Mechanisms. Computer Communications, pages 120–141,. Ch. engchi. i Un. v. [36] F. G. Olumofin. Practical Private Information Retrieval. University of Waterloo, 2011. [37] P. Pascal. Public-key Cryptosystems based on Composite Degree Residuosity Classes. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 223– 238. Springer, 1999. [38] P. Pascal and P. David. Efficient Public-Key Cryptosystems Provably Secure against Active Adversaries. In K.-Y. Lam, E. Okamoto, and C. Xing, editors, Advances in Cryptology - ASIACRYPT’99, pages 165–179. Springer Berlin Heidelberg, 1999. [39] M. O. Rabin. How to Exchange Secrets by Oblivious Transfer. Technical Report TR-81, 1981.. 43 DOI:10.6814/NCCU202001380.

(52) [40] O. Rafail and W. E. Skeith. A Survey of Single-database Private Information Retrieval: Techniques and Applications. In T. Okamoto and X. Wang, editors, Public Key Cryptography – PKC 2007, pages 393–411. Springer Berlin Heidelberg”, 2007. [41] R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures and Publickey Cryptosystems. Communications of the ACM, pages 120–126, 1978. [42] F. Saint-Jean. Java Implementation of a Single-database Computationally Symmetric Private Information Retrieval (cSPIR) protocol. 2005. [43] R. Sion and B. Carbunar. On the Computational Practicality of Private Information Retrieval.. 政治大. Proceedings of the Network and Distributed Systems Security Symposium, 2007.. 立. [44] J. Stern. A New and Efficient All-or-nothing Disclosure of Secrets Protocol. In International. ‧ 國. 學. Conference on the Theory and Application of Cryptology and Information Security, pages 357– 371, 1998.. ‧. er. io. sit. Nat. Quantum Computing. Physical Review A, 91(5):052303, 2015.. y. [45] Z. Sun, J. Yu, P. Wang, and L. Xu. Symmetrically Private Information Retrieval based on Blind. [46] C. Wang, Q. Wang, K. Ren, and W. Lou. Privacy-Preserving Public Auditing for Data Storage. n. al. Ch. i Un. v. Security in Cloud Computing. In 2010 Proceedings IEEE INFOCOM, pages 1–9, 2010.. engchi. [47] S. Wang and X. Ding. Private Information Retrieval using Trusted Hardware. In D. Gollmann, J. Meier, and A. Sabelfeld, editors, Computer Security –ESORICS 2006, pages 49–64. Springer, Berlin, Heidelberg, 2006. [48] G. Yael, I. Yuval, K. Eyal, and M. Tal. Protecting Data Privacy in Private Information Retrieval Schemes. Journal of Computer and System Sciences, 60(3):592–629, 2000. [49] X. Yi, M. Kaosar, R. Paulet, and E. Bertino. Single-database Private Information Retrieval from Fully Homomorphic Encryption. IEEE Trans. on Knowledge and Data Eng., pages 1125–1134, 2013. [50] X. Yi, R. Paulet, and E. Bertino. Private Information Retrieval. Morgan & Claypool, 2013. 44 DOI:10.6814/NCCU202001380.

(53)