• 沒有找到結果。

Building the Secure Kernel Matrix from Data Perturbed by Ran-

2.4 Secure SVM Outsourcing with Random Linear Transformation

2.4.2 Building the Secure Kernel Matrix from Data Perturbed by Ran-

After we are able to permit the service provider to have the secure kernel matrix for solv-ing the RSVM, in the followsolv-ing, we show how to enable the service provider to build the secure kernel matrix from the data perturbed by random linear transformation. The random linear transformation does not preserve the dot product or distance relationships between training instances and hence is stronger in privacy preservation. Then the data owner can outsource the SVM by sending the random linearly transformed training in-stances to the service provider, and then the service provider builds a secure kernel matrix without knowing the actual content of the training data where the secure kernel matrix built by the service provider is exactly the same with the one built from the original train-ing data.

The data owner will send the perturbed training instances as well as the perturbed random vectors of the reduced set to the service provider for computing secure kernel matrices. A secure kernel matrix K is computed from training instances and random vectors of the reduced set by Ki,j = k(xi, rj), i = 1, . . . , m, j = 1, . . . , ¯m. Our objective is to let the service provider compute the same K from perturbed training instances and random vectors, while the perturbation scheme should not allow the security weakness of the geometric perturbation schemes, i.e., the dot product and Euclidean distance among training instances should not be preserved. The perturbation scheme needs to preserve the kernel evaluations between a training instance and a random vector for computing secure kernel matrices. We utilize the random linearly transformation perturbation for computing the dot product of two differently transformed instances [57]. Note that the attribute vectors xi’s of training instances are usually considered sensitive, but the class labels yi’s are usually not.

Let M be a nonsingular n× n matrix composed of random values. We perturb the instances of the training dataset by a random linear transformation L :Rn → Rn, where the matrix M works as the random linear operator. All training instances are perturbed by

the random linear transformation1

ci = L(xi) = M xi for i = 1, . . . , m (2.6)

Unlike the geometric perturbation, the random linear transformation does not preserve the Euclidean distance and dot products between training instances since the vector space is randomly transformed. Hence the security weakness of the rotational or translational transformation does not exist in the data perturbed by random linear transformation. The random vectors rj, j = 1, . . . , ¯m of the reduced set are also perturbed by another random linear transformation L :Rn→ Rnwith (MT)−1as the random linear operator:

sj = L(rj) = (MT)−1rj for j = 1, . . . , ¯m (2.7)

The perturbed training instances ci, i = 1, . . . , m and perturbed random vectors of the reduced set sj, j = 1, . . . , ¯m are then sent to the service provider for building secure kernel matrices.

The dot product between an instance xi and a random vector rj can be equivalently computed from the dot product of ciand sjby cTi sj = (M xi)T(MT)−1rj = xTi MT(MT)−1rj

= xTi Irj = xTi rj. Therefore, for the dot product-based kernel functions including the lin-ear kernel k(xi, rj) = xi · rj, polynomial kernel k(xi, rj) = (gxi · rj + r)d, and neural network kernel k(xi, rj) = tanh(gxi· rj+ r) , the kernel evaluations between an instance and a random vector can be equivalently derived from the perturbed training instances and random vectors.

For Gaussian kernel k(xi, rj) = exp(−g||xi− rj||2) which is based on the Euclidean distance, a slight modification is needed to add another two dimensions to the original instances xi ∈ Rn as xi = (xi,1, xi,2, . . . , xi,n, 1,−12||xi||2)T before applying the trans-formation. The random vectors rj’s of the reduced set are also added by another two dimensions as rj = (rj,1, rj,2, . . . , rj,n,−12||rj||2, 1)T. Then the corresponded random matrix for random linear transformation is a nonsingular (n + 2)× (n + 2) matrix M.

1It is not necessary to put the whole matrix M in the main memory. The computation can be decomposed to M x = x1M:,1+· · · + xnM:,n, where M:,iis the i-th column of M .

Similarly, the data are perturbed by ci = M xi, and the random vectors are perturbed by sj = (MT)−1rj. The Euclidean distance between xi and rj in the Gaussian kernel can be equivalently computed from ciand sjby−2cTi sj =−2x′Ti MT(MT)−1rj =−2x′Ti Irj =

−2x′Ti rj =||xi||2− 2xTirj+||rj||2 =||xi− rj||2.

Therefore, for common kernel functions based on dot product or Euclidean distance, the kernel evaluations between an instance in the training dataset and an instance in the reduced set can be equivalently computed from their perturbed versions.

Lemma 4 The RSVM problem (2.4) with dot product-based and Euclidean distance-based kernel functions constructed from the training instances (xi, yi), i = 1, . . . , m and the random vectors rj, j = 1, . . . , ¯m can be equivalently obtained from the random linear transformation-perturbed training instances and random vectors ci, i = 1, . . . , m and sj, j = 1, . . . , ¯m along with labels yi, i = 1, . . . , m.

Proof 5 Since the dot product of any (ci, sj) pair is equal to the dot product or Euclidean distance of the corresponding (xi, rj) pair, for dot product or Euclidean distance-based kernel functions, the value of k(xi, rj) can be equivalently computed from ci’s and sj’s.

Then the secure kernel matrix K composed of Ki,j = k(xi, rj), i = 1, . . . , m, j = 1, . . . , ¯m can be equivalently obtained from ci, i = 1, . . . , m and sj, j = 1, . . . , ¯m.

Accompanying with the labels yi, i = 1, . . . , m (and the cost parameter C), a completely the same RSVM problem (2.4) can be built.

From Lemma 4, since the same RSVM problem can be built from the perturbed data, the same solutions v and b for the decision function (2.5) can be obtained. Therefore, the data owner can perform privacy-preserving outsourcing of training by perturbing the data and the reduced set with (2.6) and (2.7) and then send the perturbed data with labels and perturbed reduced set to the service provider. Since the service provider can derive the same secure kernel matrix K of (2.4) from the perturbed data ci’s and sj’s, i.e., the RSVM optimization problem derived from the perturbed data is the same to the one derived from the original data. Therefore, the service provider can obtain the same solutions vj, j = 1, . . . , ¯m and b of the decision function (2.5) for sending back to the data owner. Then the