• 沒有找到結果。

Privacy-Preserving Collaborative Recommendation Systems Based on the Scalar Product

N/A
N/A
Protected

Academic year: 2022

Share "Privacy-Preserving Collaborative Recommendation Systems Based on the Scalar Product"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

Privacy-Preserving Collaborative Recommendation Systems Based on the Scalar Product

Justin Zhan I-Cheng Wang

Abstract

In the e-commerce era, recommendation systems were introduced to share customer experience and com- ments. At the same time, there is a need for E- commerce entities to join their recommender system databases to enhance the reliability toward prospective customers and also to maximize the precision of target marketing. However, there will be a privacy disclosure hazard while joining recommender system databases.

In order to preserve privacy in merging recommender system databases, we design a novel algorithm based on ElGamal scheme of homomorphic encryption.

Key Words: Privacy, Security, Social Networks.

1 Introduction

A recommender system is a web-based applications that aim at helping customers in the decision mak- ing and product selection process. It was introduced to share customer experience and comments in elec- tronic commerce. The most prominent example in e- commerce is the online bookstore amazon.com, where collaborative filtering techniques are used to exploit similarities in the user profile which is based on the nav- igation and buying history. The main idea is to identify users who presumably have similar preferences and rec- ommend those items which were bought by other users with a similar interest profile. Another technical ap- proach is content-based filtering which builds on the hypothesis that the preferred items of a single user can be extrapolated from her preferences in the past. The third principle approach is to use domain knowledge and to base the recommendations on a thorough under- standing of the users current needs, comparable to real- life sales situation. The recommendations are the result of a reasoning process on the domain knowledge which also forms the basis for explaining to the user why an item is proposed. Knowledge-based recommender sys- tems elicit user preferences explicitly, i.e. they pro-

vide dynamic personalized and potentially persuasive sales dialogues. Recommender systems are changing from novelties used by a few E-commerce sites, to se- rious business tools that are re-shaping the world of E-commerce. Many of the largest commerce web sites are already using recommender systems to help their customers find products to purchase.

A recommender system learns from a customer and recommends products that she will find most valu- able from all the available products. With many plat- forms of e-commerce websites, joining recommendation databases can enhance the reliability toward prospec- tive customers. It can also make corporations to max- imize the precision of target marketing. Thus, merg- ing the customer-product information between two or more e-commerce entities will be desired. However, due to privacy concerns, different entities may not want to disclose the raw preferences of customers to each other.

Furthermore, it may lead to distrust among customers, and the e-commerce entities might lose their markets.

As for the recommender system techniques, there is a detailed taxonomy and examples presented by J. B.

Schafer et al. [1] According to that, recommender sys- tems can be categorized into Non-Personalized Recom- mendations, Attribute-Based Recommendations, Item- to-Item Correlation, and People-to-People Correlation.

Each of this taxonomy has a different degree of au- tomation and persistence. In this paper, we focus on the People-to-People Correlation with cosine similarity.

From the business point of view, with the advent of E- commerce, the need for personalized services has been emphasized. D. Peppers et al. [9] have advocated the need for one-to-one marketing. As for the privacy issue, J. Canny [10, 11] had proposed some schemes of pri- vacy preserving collaborative filtering. In his schemes, there is a community of user who can compute the ag- gregation of their private data without disclosing them.

Another approach is the randomized perturbation pro- posed by H. Polat et al. [12, 13]. They deployed a cen- tralized server to store the perturbed numeric ratings, then use these disguised rating to provide the predic-

(2)

tions to users. However, their approach cannot achieve 100% accuracy unless all the private data will be dis- closed. In this paper, we intend to introduce two ap- proaches which can achieve 100% accuracy while pre- serving privacy in joining recommendation databases between two entities. We also compare the proposed two approaches and make valuable discussions.

The organization of this paper is as follows. In sec- tion 2, we will introduce a recommender systems al- gorithm. In section 3, we define our problem. In sec- tion 4, we propose our solutions to the given problem.

In section 5, we present the experimental results. We make a conclusion in section 6.

2 Introduction to Recommender Sys- tems

Recommender systems are a kind of collaborative filtering systems which produce automatic predictions about the interests of a user by collecting preference information from many users. Though there are some issues about the worth of trust on collaborative filter- ing because of the possibility of biased opinions from person to person. Though providing feedback requires action by the user, less data may be available than with a passive approach, and user expectations may not be met, the advantage of adapting recommender systems to e-commerce is still desired. So as an ac- tual rating given to something of interest by a person who has viewed the topic or product of interest. This produces a reasonable explanation and rank from a re- liable source. Another advantage of active filtering is that people want to and ultimately do provide infor- mation regarding the matter at hand [8].

Recommender systems form a specific type of in- formation filtering (IF) technique that attempts to present information items (movies, music, books, news, web pages) that are likely to be interested to the user [7]. Recommender systems use product knowl- edge, e.g., either hand-coded knowledge provided by experts or ”mined” knowledge learned from the behav- ior of consumers, to guide consumers through the often- overwhelming task of locating products like [2]. As we can see the popularity of recommender systems in many e-commerce websites, such as Amazon.com, CD- NOW, eBay, Levis, and Moviefinder.com, etc. These recommender systems are intelligent techniques to deal with the problem of information and product overload.

They can be utilized to efficiently provide personal- ized services in most of e-commerce domains, bene- fiting both the customer and the merchant. Recom- mender Systems will benefit the customer by making him suggestions on items that he is assumedly like.

In the meanwhile, the business will be benefited by the increase of sales which will normally occur when the customer is presented with more appealing items [3]. There are two basic entities concerned in a recom- mender system. ”User” (also referred to as customer) is a person who utilizes the recommender system to provide his opinion and receive recommendation about items. ”Item” (also referred to as product) is being rated by users and the data of them are collected by the system. The inputs of a recommender system are usu- ally arithmetic rating values, which express the opinion of users on items. Ratings are normally provided by the user and follow a specified numerical scale (exam- ple: 1-bad to 5-excellent). Ratings can also be gath- ered implicitly from the user’s purchase history, web logs, hyperlink visits, browsing habits or other types of information access patterns. The outputs of a rec- ommender system can be either predictions or recom- mendations. A prediction is expressed as a numerical value which represents the anticipated opinion of ac- tive user for certain item. This predicted value should necessarily be within the same numerical scale (exam- ple: 1-bad to 5- excellent) as the input referring to the opinions provided initially by the active user. This form of recommender systems output is also known as Individual Scoring. On the other hand, a recommen- dation is expressed as a list of N items, where N ≤ n, which the active user is expected to like most. The usual approach in that case requires this list to include only items that the active user has not already pur- chased, viewed or rated. This form of recommender systems output is also known as top-N recommendation or ranked scoring. Now we are going to introduce the algorithm of recommender systems. There are three main processes in recommender systems, they are rep- resentation, neighborhood formation and recommenda- tion

2.1 Representation

In the original representation, the input data is de- fined as a collection of numerical ratings of m users on n items, expressed by the m*n user-item matrix R.

We term this user-item matrix of the input data set as original representation. As mentioned earlier, users are not required to provide their opinion on all items.

As a result, the user-item matrix is usually sparse, in- cluding numerous ”no rating” values, making it harder for filtering algorithms to generate satisfactory results.

Thus, some techniques, whose purpose is to reduce the sparsity of the initial user-item matrix, have been pro- posed in order to improve the results of the recommen- dation process.

(3)

2.2 Neighborhood Formation

The core step of the recommendation process is to the similarity between users in the user-item matrix R. Users similar to the active user Ua will form a proximity-based neighborhood with him. The active user’s neighborhood should then be utilized in the fol- lowing step of the recommendation process, in order to estimate his possible preferences. Neighborhood for- mation has been implemented in two steps: Initially, the similarity between all the users in the user-item matrix, R, is calculated with the help of some proxim- ity metrics. The second step is the actual neighborhood generation for the active user, where the similarities of users are processed in order to select those users from whom the neighborhood of the active user consist.

1. Proximity Metrics: The proximity between two users is usually measured using either correlation or cosine measure.

• Pearson Correlation Similarity

To find the proximity between users Ui and Uk, we can utilize the Pearson correlation metric.

simik= corrik=

Pl

j=1(rij− ri)(rkj− rk) qPl

j=1(rij− ri)2(Pl

j=1)(rkj− rk)2 It is important to note that the summations

over j are calculated over L items for which both users ui and uk have expressed their opinions. Obviously, L ≤ n, where n rep- resents the number of total items in the user- item matrix R.

• Cosine Similarity In the n-dimensional item space, we can view different users as feature vectors. A user vector consists of n feature slots, one for each available item. The val- ues used to fill those slots can be either the rating rij, that a user ui, provided for the corresponding item, ij, or 0, if no such rating exists. Now, we can compute the proximity between two users ui, and uk, by calculating the similarity between their vectors, as the cosine of the angle formed between them.

2. Neighborhood Type

At this point of the recommendation process it is mandatory to distinguish the active user for whom we would like to make predictions, and proceed

with generating his neighborhood of users. Neigh- borhood generation is based on the similarity ma- trix S. There are two neighborhood types, one is center-based, and the other is aggregate.

3. Center-based scheme Creating the neighborhood by simply selecting from the similarity matrix S, and more specifically, from the row of matrix S which corresponds to the active user, those users who have the L highest similarity values with the active user. In other words, it is static.

4. Aggregate neighborhood scheme

Creating the neighborhood by collecting the users who are closest to the centroid of the cur- rent neighborhood. The aggregate neighborhood scheme forms a neighborhood of size L by first picking the user who is closest to active user ua. Those two users will now form the current neigh- borhood, and the selection of the next neighbor will be based on them. In other words, it is a dynamic scheme.

2.3 Recommendation Generation

The final step in the recommendation process is to produce either a prediction, which will be a numerical value representing the predicted opinion of the active user or a recommendation which will be expressed as a list of the top-N items that the active user will appre- ciate more. In both cases, the result should be based on the neighborhood of users.

3 Problems

By joining recommender system databases, the e- commerce entities can expand their datasets to provide more precise predictions and recommendations. We as- sume there are two e-commerce entities, for example on-line bookstores, each of them have similar product sets (ex. books) but the customer sets are not identi- cal. Also, both of them already have their own recom- mender system with some data records. These two en- tities want to cooperate with each other to strengthen their recommender system databases and improve the precision of their recommendation towards their own customers. In other words, these entities are going to share their recommendation databases. While sharing the databases with each other, there will be a risk of revealing the raw customer-product information at the same time. This is not a desired result because dis- closing raw customer purchase records and preferences will be an infringement against personal privacy. If

(4)

customers know their private data are revealed to en- tities other than the shop they purchased, they will distrust this company, and lead to the collapse of the company. Thus, in order to merge the recommender databases together and not to disclose the actual com- mercial data at the same time, we have to check the recommender system algorithm to find the vulnerabil- ity of potential privacy disclosure. Let us recall the neighborhood formation step in the recommender al- gorithm. We measure the proximity between two cus- tomers by calculating the Pearson correlation similar- ity as followed. We can assume that, for item (prod- uct) j, the ratings of rijandrkjaremadebyusersui and uk from different e-commerce entities. While joining these two recommender system databases, we have to share the values (rij − ri) and (rkj− rk) with each other. But with value (rij − ri), others can get the idea of how much does user i prefer item j compared to his average rating ri. Now we found the potential in- fringement of privacy while collaboratively calculating the Pearson correlation similarity for neighborhood in recommender system algorithm.

4 Privacy-Preserving Scalar Product Protocol

The problem of Secure Multiparty Computation (SMC) was first addressed by Yao in his seminal paper [?]. Loosely speaking, SMC deals with the computa- tion of any function with inputs by two or more parties in a distributed network, while ensuring that no more information is revealed than can be inferred from each participant’s input and output. Solutions to the SMC problem have been proposed by Yao [?] and Goldreich et al.[?]. The security of these solutions is based on some cryptographic assumptions, such as the existence of trapdoor permutations. The solutions are generic and elegant, but their prohibitive cost in protecting privacy makes them totally unsuitable for large-scale applications. Therefore, practical solutions need to be developed.

It has been shown that the two-party scalar-product protocol is one of the most important building blocks in the Secure Two-party Computation [?]. On the one hand, the building block of generic circuit evalua- tion, the oblivious transfer, can be implemented by the scalar-product protocol. On the other hand, it would be much more efficient if we were to construct functions from scalar-products instead of binary gates.

We now introduce the secure two-party product protocol proposed in [?], and prove that it is indeed information-theoretically secure.

Protocol π2.

f2(x1, x2) 7→ (y1, y2), where x1· x2= y1+ y2. 1. The commodity server generates random numbers

s1, s2, and r1, and lets r2 = s1· s2− r1. It then sends (s1, r1) to P1 and (s2, r2) to P2.

2. P1sends ˆx1= x1+ s1 to P2. 3. P2sends ˆx2= x2+ s2 to P1.

4. P2 computes t = ˆx1· x2+ r2− y2 and sends it to P1, where y2 is a randomly generated number.

5. P1computes y1= t − ˆx2· s1+ r1.

Theorem 1 (Secure Two-party Product Protocol) π2 is an information-theoretically secure protocol that realizes function f2in the semi-honest adversary model with private channels.

Proof 1 Recall that, according to the problem assump- tions, x1, x2, s1, s2, r1, and y2 are independent ran- dom variables defined on GF (p); moreover, s1, s2, r1, and y2 are uniformly distributed. From P1’s perspec- tive, in the protocol we have

H(x2|viewπ12) = H(x2|x1, s1, r1, ˆx2, t, y1). (1) In the protocol, y1 is computed from s1, r1, ˆx2, and t;

therefore, according to Lemma ??, we know that H(x2|x1, s1, r1, ˆx2, t, y1) = H(x2|x1, s1, r1, ˆx2, t). (2) Because t = ˆx1· x2+ r2− y2, and y2 is independent of (x1, x2, s1, s2, r1), Theorem ?? states that

H(x2|x1, s1, r1, ˆx2, t) = H(x2|x1, s1, r1, ˆx2). (3) Similarly, because ˆx2= x2+s2, and s2is uniformly dis- tributed and independent of (x1, x2, s1, r1), Theorem ??

states that

H(x2|x1, s1, r1, ˆx2) = H(x2|x1, s1, r1). (4) Then, according to Eqn. (1)∼(4) and the independence between (x1, s1, r1) and x2, we have H(x2|viewπ12) = H(x2).

Next, let us consider P2’s point of view in the pro- tocol:

H(x1|viewπ22) = H(x1|x2, s2, ˆx1, r2, y2), (5) where ˆx1= x1+ s1 and r2= s1· s2− r1. Because y2 is independent of (x1, x2, s1, s2, r1), we know that

H(x1|x2, s2, ˆx1, r2, y2) = H(x1|x2, s2, ˆx1, r2). (6)

(5)

Since r2is independent of (x1, x2, s1, s2) and uniformly distributed, Theorem ?? states that

H(x1|x2, s2, ˆx1, r2) = H(x1|x2, s2, ˆx1). (7) Again, because of the independence between s1 and (x1, x2, s2), and by Theorem ??, we have

H(x1|x2, s2, ˆx1) = H(x1|x2, s2). (8) From Eqn. (5)∼(8) and the independence between x1

and (x2, s2) we prove that H(x1|viewπ22) = H(x1), which implies that π2 is information-theoretically se- cure.

Finally, by y1 = t − ˆx2· s1+ r1 = x1· x2− y2, we prove that π2 indeed realizes f2, which completes the proof.

5 Privacy-Preserving Recommender System

33

6 Performance Evaluation

In this section, we describe the implementation of our proposed solution. It processes on Intel Pen- tium M 2.26 GHz processor with 2 GB RAM. We use Microsoft Access 2007 as data storage. And all the programs are written in JAVA, using default APIs. We use a well-known real dataset in this experiment. Jester Joke Recommender System re- leased by Ken Goldberg from UC Berkley. Jester (http://shadow.ieor.berkeley.edu/humor/) has collab- orative filtering data of 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users, col- lected between April 1999 - May 2003. We take the densest sub-dataset of ratings from 23,500 users who have rated 36 or more jokes, a matrix with dimensions 23,500 * 101. And for the representation process of recommendation generation, we add default value 0 for the items not rated. In this early implementation we chose a single active user, then using our algorithm to calculate the similarities between the active 3,000ms for 2048 bits respectively. As we can see, this result presents a exponential cost of time while doubling the encryption key length, that is concordant to the mathe- matic computation of ElGamal encryption. Second, we implement the algorithm with different size of record involved. We have calculated the similarities in size of 5000, 10000, 15000, 20000, and 23500 records. As it showed in the Figure 3, we can see that both encryp- tion time and transmission time grow linearly while we enlarger the data size.

user and other 23,500 users within a different rec- ommendation database. First, we use the whole 23,500 records to test with encryption key length. Thus, we can measure the performances between encryption with key length of 128, 256, 512, 1024, and 2048 referring to the Figure 2. With 128 bits, the time elapsed by encryption of 23,500 records is about 1,000ms, and 1,100ms for 256 bits, 1,400 for 512 bits, 2,000ms for 1024 bits and

7 Conclusion and Future Works

Because of the development of E-commerce, rec- ommender systems are more and more popular nowa- days. However, customers will not trust the recom- mendation if the recommender system database is too small. Thus, there is a need for E-commerce enti- ties to join their recommender system databases to enhance the reliability towards prospective customers and also to maximize the precision of target market- ing. At the same time we have to preserve the privacy of preferences of customers. With the introduced al- gorithm, E-commerce entities can merge their recom- mender system databases without disclosing customer privacy data. For future works, we can apply our homomorphic encryption approach to different taxon- omy of recommender system. Thus we might discover some better combinations of techniques for privacy- preserving recommendation systems.

參考文獻

相關文件

The ontology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge. The MF technique for SLU modeling provides a principle model

BPR-MF optimizes the Area Under receiver-operating- characteristic Curve (AUC) instead of square error. The reason to optimize AUC is that it yields a model that produce

Matrix factorization (MF) and its extensions are now widely used in recommender systems.. In this talk I will briefly discuss three research works related to

as long as every kernel value is composed of two data vectors and stored in a kernel matrix, our proposed method can reverse those kernel values back to the original data.. In

However, by a careful study of the positivity of vector bundles over elliptic curves in Section 4, we are able to prove our main theorem for threefolds in Section

Tsuji, H.: Pluricanonical systems of projective varieties of general type II.. Tsuji, H.: Pluricanonical systems of projective 3-folds of

I understand that if I willfully give any false information or withhold any material information in this application form, or fail to notify the office concerned of any

Cho, Yoon Ho; Kim, Jae Kyeong, Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce , Expert Systems with Applications , Volume: