• 沒有找到結果。

In this section, we first briefly review the SVM. Then we discuss the related work which perturbs the data by geometric transformations to outsource the SVM, and show the se-curity weakness of utilizing this scheme.

2.3.1 Review of the SVM

The SVM [55] is a statistically robust learning method with state-of-the-art performance on classification. The SVM trains a classifier by finding an optimal separating hyperplane which maximizes the margin between two classes of data. Without loss of generality, suppose there are m instances of training data. Each instance consists of a (xi, yi) pair where xi ∈ Rndenotes the n features of the i-th instance and yi ∈ {+1, −1} is its class label. The SVM finds the optimal separating hyperplane w· x + b = 0 by solving the quadratic programming optimization problem:

arg min

w,b,ξ

1

2||w||2+ C

m i=1

ξi

subject to yi(w· xi+ b)≥ 1 − ξi, ξi ≥ 0, i = 1, ..., m.

Minimizing 12||w||2in the objective function means maximizing the margin between two classes of data. Each slack variable ξi denotes the extent of xi falling into the erroneous region, and C > 0 is the cost parameter which controls the trade-off between maximizing the margin and minimizing the slacks. The decision function is f (x) = w· x + b, and the testing instance x is classified by sign(f (x)) to determine which side of the optimal separating hyperplane it falls into.

The SVM’s optimization problem is usually solved in dual form to apply the kernel trick:

The function k(xi, xj) is called kernel function, which implicitly maps xi and xj into a high-dimensional feature space and computes their dot product there. By applying the kernel trick, the SVM implicitly maps data into the kernel induced high-dimensional space to find an optimal separating hyperplane. Commonly used kernel functions include Gaussian kernel k(x, y) = exp(−g||x − y||2) with g > 0, polynomial kernel k(x, y) = (gx· y + r)d with g > 0, and the neural network kernel k(x, y) = tanh(gx· y + r), where g, r, and d are kernel parameters. The original dot product is called linear kernel k(x, y) = x· y. The corresponding decision function of the dual form SVM is

f (x) =

m i=1

αiyik(xi, x) + b, (2.2)

where αi, i = 1, . . . , m are called supports, which denote the weights of each instance to compose the optimal separating hyperplane in the feature space.

Without appropriate choices on the cost and kernel parameters, the SVM will not achieve good classification performance. A process of parameter search (also called model selection) is required to determine a suitable parameter combination of the cost and

kernel parameters for training the SVM. Practitioners usually evaluate the performance of a parameter combination by measuring its average accuracy of cross-validation [22, 46], and the search of parameter combinations is often done in a brute-force way. For ex-ample, with the Gaussian kernel, the guide of LIBSVM [7, 22] suggests performing a grid-search with exponential growth on the combinations of (C, g). The parameter com-bination which results in the highest average cross-validation accuracy will be selected to train an SVM classifier on the full training data.

The parameter search process can be very time-consuming since there are usually hundreds of parameter combinations to try, and for each parameter combination, with c-fold cross-validation, there are c SVMs to be trained on c−1c of the training data. For example, the default search range of LIBSVM’s parameter search tool for Gaussian kernel is C = 2−5 to 215and g = 2−15to 23, both stepped in 22, where 5-fold cross-validation is used. There are 11× 10 = 110 parameter combinations to be tested in its default setting, and hence there are totally 5× 110 = 550 SVMs to be trained in the parameter search process.

If the training dataset is large, training an SVM is already costly, and the parameter search process even involves training hundreds of SVMs along with hundreds of testings.

Due to the heavy computational load, for a data owner who has only limited computing resources, it is reasonable to outsource the SVM training to an external service provider.

It is noted that the SVM problems in the parameter search process are independent, and hence the parameter search process can be easily parallelized in cloud computing envi-ronment. Since the service provider may be untrusted or malicious, the actual content of the data needs appropriate protections to preserve the data privacy.

2.3.2 Privacy-Preserving Outsourcing of SVM with Geometric Per-turbations and Its Security Concerns

Perturbing the data by geometric transformations like rotating or translating the attribute vectors of instances can be utilized for privacy-preserving outsourcing of the SVM [9,10].

Since SVMs with common kernel functions rely only on the the dot product or Euclidean

distance relationships among instances, and such relationships of data are completely preserved in the their geometrically perturbed forms, the SVM problems constructed from the original instances can be equivalently derived from the geometrically perturbed ones.

The geometric perturbation works as follows: Consider the dataset {(xi, yi)|i = 1, . . . , m} with m instances. The content of each attribute vector xi ∈ Rnis deemed to be sensitive, while the class labels yi’s are usually not. Let M denote an n×n rotation matrix, where the content of M , i.e., the angle of rotation, is kept secret. The data are perturbed by multiplying all instances with the matrix M . Although the content of instances is modi-fied in their perturbed versions M xi, i = 1, . . . , m, the dot products of all pairs of original instances are retained: M xi · Mxj = (M xi)TM xj = xTi MTM xj = xTi Ixj = xTi xj, 1 ≤ i, j ≤ m, where MT = M−1 since the rotation matrix is an orthogonal matrix, whose transpose is equal to its inverse. The Euclidean distances are also preserved since the square of the Euclidean distance can be computed from the dot products by

||xi − xj||2 = xi · xi− 2xi· xj + xj · xj. In addition to rotation, the translation bation also preserves the Euclidean distance relationship of data. The translation pertur-bation is done by adding a random vector to all instances, which moves all instances in a certain distance and the same direction, where the relative distance among instances are the same in the translated data. The translation perturbation can be applied together with the rotation perturbation.

Since the dot product or Euclidean distance relationships are preserved in the geomet-rically perturbed data, the SVM problem (2.1) derived from the geometgeomet-rically perturbed dataset is equivalent to that derived from the original dataset. Therefore, the data owner can outsource the computations of solving the SVM without revealing the actual content of data by sending the geometrically perturbed dataset to the service provider, and the ser-vice provider can construct the SVM problem which is equivalent to the one constructed from the original dataset. The service provider sends back the solution of the SVM to the data owner after solving the SVM problem, and the data owner gets the classifier by pairing the returned solution, which composed of the supports of each instance and the bias term, with the original data to make up the decision function (2.2).

Although perturbing with geometric transformations preserves the required utilities of data like the dot product or Euclidean distance relationships for forming the SVM problems with common kernel functions, the perturbation preserving such utilities suffers from the dot product or distance inference attacks.

Suppose the attacker, i.e., the service provider, obtains some of the original instances from external information sources. Since the Euclidean distance or dot product relations are preserved in the perturbed dataset, the mappings of the leaked instances and their perturbed ones can be inferred by comparing the Euclidean distance or dot products of all pairs of leaked instances to that of all pairs of the perturbed instances. In general, the same dot product or Euclidean distance happening to different pairs of instances is in low probability. Hence the corresponding pairs of the leaked instances in the perturbed dataset are usually able to be identified by comparing Euclidean distance/dot product. Once the pairs in the perturbed dataset are all recognized, the exact mappings of leaked instances and their corresponding perturbed instances can be deduced by tracking the Euclidean distance/dot product between pairs.

In the perturbation preserving dot products, for n-dimensional data, if n or more lin-early independent instances are known and the mappings to their perturbed ones are suc-cessfully determined, all other perturbed instances, whose original content is unknown to the attacker, can be recovered by setting up n equations as the following lemma shows:

Lemma 1 For n-dimensional data, the perturbation preserving the dot product can be broken if the content of n linearly independent instances and the mappings of their per-turbed ones are known.

Proof 1 Let x1, . . . , xm ∈ Rn denote a set of instances, and c1, . . . , cm are their corre-sponding perturbed instances, where xi· xj = ci · cj for any 1≤ i, j ≤ m. Without loss of generality, assume that the attacker have obtained n linearly independent instances x1, . . . , xn, and identified that c1, . . . , cn are the corresponding perturbed instances by inferring the dot product relations. Then any other perturbed instances cu, n < u ≤ m

can be recovered to the original xu’s by solving the n simultaneous linear equations:

xi· xu = ci· cu for i = 1, . . . , n.

The perturbation which preserves the Euclidean distance can be broken in a similar way given that n + 1 linearly independent instances are leaked and the mappings of their perturbed ones are identified.

In addition to the perturbation preserving Euclidean distance or dot product relation-ships, allowing the service provider to construct the kernel matrix Q of the SVM problem (2.1) also possesses the risk as well as preserving the Euclidean distance or dot product in the perturbed instances since such utilities of data can be derived from the elements of the kernel matrix.

Lemma 2 For SVMs with common kernel functions, revealing the kernel matrix is inse-cure as preserving the Euclidean distance or dot product relationships of data

Proof 2 The elements of the kernel matrix are the kernel values between all pairs of instances. With the linear kernel, the kernel matrix contains the dot products between all pairs of instances, including the dot products of an instance with itself. With the Gaussian kernel, the Euclidean distance of two instances can be derived from computing the natural logarithm of their kernel value and then dividing it by the kernel parameter.

The polynomial kernel and the neural network kernel are based on dot products. The dot product between instances can be derived by computing the d-th root and the tanh−1, respectively.

2.4 Secure SVM Outsourcing with Random Linear