Construction of the Privacy-Preserving Decision Function

3.4 Privacy-Preserving SVM Classifier

3.4.1 Construction of the Privacy-Preserving Decision Function

The decision function of the Gaussian kernel SVM classifier is

f (x) =

m^′

∑

i=1

α_iy_iexp(−g||SVi− x||²) + b (3.8)

The components of the classifier are the kernel parameter g, the bias term b, support vectors{(SV1, y₁),· · · , (SVm^′, y_m′)} and their corresponding supports {α1,· · · , αm^′}.

The content of each attribute vector SViis considered to be sensitive, but the class labels y_i’s are usually not. We intend to destroy the content of all support vectors’ attribute vectors in the decision function by an irreversible way similar to the effect the linear

combination causes in the linear kernel SVM classifier, as we have mentioned in Section 3.3.2.

The value of the Gaussian kernel function K(x, y) = exp(−g||x − y||²) depends on the relative distance between two instances||x − y||. In the decision function (3.8), the term||SVi−x||² which calculates the square of the distance between the testing instance x and a support vector SV_i can be computed by||SVi− x||² =||SVi||²− 2(SVi· x) +

||x||² . So the decision function (3.8) can be equivalently formulated as

f (x) = b+

In (3.9), the expanded form of the decision function, there are two terms contain-ing support vectors: exp(−g||SVi||²) and exp(2gSV_i · x) in the summation operator.

The former term depends merely on the magnitude of SV_i and hence can be computed a priori. All exp(−g||SVi||²), i = 1 to m^′ can be combined with α_iy_i to constants

The term exp(−g||x||²) extracted from the summation operator in (3.9) or (3.10) is a scalar related only to the testing instance x. This term has no connection with the privacy of the training data. Now the support vectors exist only in the term exp(2gSV_i· x) in the summation operator. We proceed to tackle it by replacing the exponential function with its infinite series representation:

By replacing exp(2gSV_i · x) with its infinite series representation exp(2gSVi · x) =

∑_∞

d=0

(2gSVi·x)^d

d! , the∑m^′

i=1c_iexp(2gSV_i· x) of the decision function (3.10) becomes

m^′

It follows that the support vectors exist only in the term (SVi · x)^dof the inner sum-mation operator. We next take a key step by applying the monomial feature mapping.

The form of (x· y)^d corresponds to the monomial feature kernel [47], which can be defined as the dot product of the monomial feature mapped x and y as (x· y)^d = Φ_d(x)· Φd(y) where Φ_d() is the order-d monomial feature mapping (The rationale of the monomial feature mapping will be given in Section 3.4.1).

Thus the (SV_i· x)^din (3.12) with monomial feature kernel form can be equivalently computed by the dot product of the order-d monomial feature mapped support vector SVi

and testing instance x:

(SV_i· x)^d= Φ_d(SV_i)· Φd(x)

A key step arises from writing the monomial feature kernel as the dot product of monomial feature mapped instances. By replacing the (SV_i· x)^dwith Φ_d(SV_i)· Φd(x)

It is noted that in each order-d monomial feature mapped space, all the order-d

mono-mial feature mapped support vectors {Φd(SV₁),· · · , Φd(SV_m′)} can be linearly

In each w_d, all support vectors are mapped into the order-d monomial feature space and linearly combined, and hence the content of each support vector SVi has been de-stroyed in the linear combination similar to the w = ∑_m^′

i=1α_iy_iSV_i in the linear kernel

By substituting both (3.14) and (3.15) into (3.13), which is equivalent to the

∑m^′

i=1c_iexp(2gSV_i· x) of the decision function (3.10), the (3.13) can be represented as

m^′

By feeding the (3.16) into the decision function (3.10), the decision function becomes:

f (x) = exp(−g||x||²)

This is the privacy-preserving form of the decision function of the Gaussian kernel SVM classifier. In this new form of the decision function, the data which need to be pre-served in the classifier are w_d’s of each order-d instead of support vectors in the original decision function. The private content of support vectors has been destroyed by the linear combinations, and the necessary information to perform classification originally provided from support vectors can be given by w_d’s, which are linear combinations of monomial feature mapped support vectors.

The privacy-preserving decision function (3.17) has an infinite series, which contains w_d, d = 1, . . . ,∞, the linear combinations of monomial feature mapped support vectors

from order-1 to order-∞, and the monomial feature mapped testing instance Φd(x) from order-1 to order-∞. The infinite complexity of the privacy-preserving decision function is surely impractical. However, since the infinite series in the privacy-preserving decision function is a Taylor series, it can be precisely approximated near the evaluating point by merely a little number of low-order terms and hence makes possible the practical use.

Later we will study the precision of approximating by the Taylor polynomial both in theoretical analyses and empirical experiments to show that the privacy-preserving deci-sion function can be precisely approximated by using merely a few low-order terms of the infinite series. Before going to the approximation of the privacy-preserving decision function, we first present the monomial feature mapping.

Monomial Feature Mapping

Lemma 6 below states how monomial feature mapping replaces (x·y)^dby the dot product of Φ_d(x) and Φ_d(y), the order-d monomial feature mapped x and y.

Lemma 6 For x, y ∈ R^N and d ∈ N, the monomial feature kernel K(x, y) = (x · y)^d generates order-d monomial features of x and y. Suppose x and y are n-dimensional.

The feature map of this kernel can be defined coordinate-wise as

Φ_m(x) =

i=1mi = d. Every such m corresponds to each dimension of monomial features [46, 47].

Proof 10 All terms in the expansion of (x· y)^d = (x₁y₁ +· · · + xny_n)^d will be in the form (x1y1)^m¹(x2y2)^m²· · · (xnyn)^mⁿ, where each mi is an integer with 0 ≤ mi ≤ d and

∑_n

i=1m_i = d. By multinomial theorem [18], the coefficient of each (x₁y₁)^m¹(x₂y₂)^m²· · · (xny_n)^mⁿterm is_m ^d!

A simple example to illustrate the monomial feature mapping is given as follows. The order-2 monomial feature kernel of x, y∈ R²[47, 55]:

(x· y)² = ((x₁, x₂)· (y1, y₂))² = (x₁y₁+ x₂y₂)²

= (x²₁,√

2x₁x₂, x²₂)· (y²1,√

2y₁y₂, y₂²)

From Lemma 6, those m’s satisfying∑₂

i=1m_i = 2 with 0 ≤ mi ≤ 2 are (2, 0), (1, 1), and (0, 2). So the order-2 monomial features of x ∈ R² are x²₁, x₁x₂, and x²₂. The corresponding coefficients are 1, √

2, and 1 from the (3.18) of Lemma 6. Hence the order-2 monomial feature mapping of x = (x₁, x₂) (y, respectively) is (x²₁,√

2x₁x₂, x²₂).

The dimensionality of the order-d monomial feature mapping for n-dimensional vec-tors is stated in Lemma 7 below.

Lemma 7 For x ∈ Rⁿ, the dimensionality of x’s order-d monomial feature mapping is (_d+n−1

Proof 11 From Lemma 6, every m∈ Nⁿwith∑n

i=1mi = d where each miis an integer with 0 ≤ mi ≤ d corresponds to one dimension of monomial features. Enumerating all such m’s is equivalent to finding all integer solutions of the equation m1+m₂+· · ·+mn = d where m_i ≥ 0 for i = 1 to n. Enumerating all integer solutions to this equation is equivalent to enumerating all size-d combinations with repetitions from n kinds of objects [18], and the number of the combinations with repetitions is(_d+n₋₁

To generate the monomial feature mapping, an algorithm to enumerate all size-d com-binations with repetitions from n kinds of objects is required. Due to the space limit, its detail is omitted. Notice that the monomial feature mappings do not need to be gener-ated in testing time. To make a more efficient classifier, those mappings can be genergener-ated off-line, and the classifier simply takes the corresponded mapping to use.

3.4.2 PPSVC: Approximation of the Privacy-Preserving Decision

在文檔中隱私保存的高效率資料分類方法 (頁 68-74)