4.3 Analysis
4.3.2 Successful Retrieval Probability
When n and k are fixed, u and v affect the probability of a successful retrieval.
We investigate the relations of these parameters for the success probability.
The results are given in Theorem 1 and Theorem 2.
To retrieve all k messages, the key servers have to get k stored data σj1, σj2, · · · , σjk from k different storage servers SSj1, SSj2, . . . , SSjk and apply ShareDec to acquire ˜ζi1,j1, ˜ζi2,j2, . . ., ˜ζik,jk. Furthermore, a k × k matrix K formed by the coefficient vectors in ˜ζi1,j1, ˜ζi2,j2, . . ., ˜ζik,jk needs to be invertible in order to solve the k messages. The random process is on the selection of distinct SSj1, SSj2, . . ., SSjk by the key servers and the coefficient vectors in σj1, σj2, . . ., and σjk. Let E1 be the event that less than k distinct storage servers are queried by the key servers. For the generator matrix G implicitly generated by the owner and the storage servers, let E2 be the event that the submatrix K of k columns j1, j2, . . . , jk of G is non-invertible. The Figure 4.5 shows the probability space of the successful retrieval event. The outer circle presents the sample space. The solid circle presents the event E1 and the inner circle shows the event E2. The event of a successful retrieval is showed as the shadow area. Thus, the probability of a successful retrieval by the owner is
1 − Pr[E1] − Pr[E2|E1] Pr[E1] (4.2)
We analyze suitable settings of m, v, and u, where n = ak3/2 and n = ak, respectively and the results are listed in the following:
1. n = ak3/2, a >√
2, m ≥ t ≥ k > 1, v = bk1/2ln k, u = 2 with b > 5a
Figure 4.5: The event of a successful retrieval is showed as the shadow area.
2. n = ak, a > 1, m = t = k > 1, v = b1ln k, u = b2ln k with b1 > 5a and b2 > 4 + 3/ ln a
We image a networked storage system that consists of a large number of storage servers. The number k of stored messages each time is much less than n. Thus, the first setting of n = ak3/2 is better than the second setting of n = ak. Although, in the regular coding theory, the constant information rate for the second setting may be preferred, the first setting is more suitable for the application to practical networked storage systems.
Theorem 1. Assume that there are k messages, n storage servers, and m key servers where n = ak3/2, m ≥ t ≥ k > 1 and a is a constant with a > √
2. For v = bk1/2ln k and u = 2 with b > 5a, the probability of a successful retrieval is at least 1 − k/p − o(1).
Proof. To analyze Pr[E1], we consider that each storage server is a bin and each key server has u balls, where u = 2. When a key server queries a storage server, we consider that the key server throws a ball into the bin. Because the key servers make queries randomly, those balls are randomly thrown into
Figure 4.6: The random bipartite graph H.
The random bipartite graph H has two sets of vertices and random edges. The random subgraph H′ is defined by a random set of k vertices in V2 and the set of k vertices in V1.
n bins. The probability that less than k bins contain balls is:
Pr[E1] ≤ Ck−1n k − 1
The event E2under the condition E1 can be modeled by forming a perfect matching in the random bipartite graph H with respect to G. The random
bipartite graph H is illustrated in Figure 4.7 and constructed as follows. Let each ciphertext Ci be a vertex v1,i and V1 be the set of all vertices for Ci’s.
Let each storage server SSj be a vertex v2,j and V2 be the set of all vertices for SSj’s. When a ciphertext Ci is distributed to the storage server SSj, there is an edge (v1,i, v2,j). The matrix K induces a subgraph H′ of the bipartite graph H. The subgraph H′ consists of all vertices in V1, a subset V2′ ⊂ V2 that V2′ is a subset of queried storage servers and |V2′| = k, and edges (v1,i, v2,j) for all v1,i ∈ V1 and v2,j ∈ V2′. If H′ has no perfect matching, K is not invertible. If H′ has a perfect matching, K is non-invertible if and only if det(K) = 0. The value of det(K) depends on the random coefficients chosen by the storage servers. Let E3 be the event that H′ has no perfect matching, and E4 be the event that det(K) = 0. We have,
Pr[E2|E1] = Pr[E3|E1] + Pr[E4|E3∧ E1] Pr[E3|E1]
≤ Pr[E3|E1] + Pr[E4|E3∧ E1] (4.4)
We analyze the probability of E3 conditioned on E1 by using the Hall’s Lemma in the following form [34].
Lemma 1. (Hall’s Lemma.)
Let H′ be a bipartite graph with vertex sets V1 and V2′, where |V1| = |V2′| = k. If H′ has no isolated vertex and no perfect matching, then there exists a set A ⊂ V1 or A ⊂ V2 such that:
• 2 ≤ |A| ≤ k+12
• The number of neighbors of A is |A| − 1.
• The subgraph induced by A and its neighbors is connected.
Hence, there are two cases that H′ has no perfect matching. First, H′ has at least one isolated vertex. Second, H′ has no isolated vertex and a set A satisfies the above conditions. Let EI be the event that H′ has at least one isolated vertex and EA be the event that some set A satisfies the conditions.
We obtain
Pr[E3|E1] ≤ Pr[EI|E1] + Pr[EA|E1] (4.5)
Starting from EI, we consider each vertex in V2 as a bin and each edge from V1 to V2 as a ball. When an edge connects to a vertex in V2, a ball is thrown into the bin. Consider the subset B of the bins corresponding to the subset V2′ of V2. Thus, B contains k bins. EI means that there is one or more empty bins in B. For a fixed bin in B, the probability of the bin being empty is (1 − 1/n)bk3/2ln k since there are bk3/2ln k balls. By using the union bound on k bins, we have the probability of EI conditioned on E1 as:
Pr[EI|E1] ≤ k(1 − 1/n)bk3/2ln k
or A ⊂ V2′. Thus,
Pr[EA|E1] = Pr[EA and A ⊂ V1|E1] Pr[A ⊂ V1] + Pr[EA and A ⊂ V2′|E1] Pr[A ⊂ V2′]
≤ Pr[EA and A ⊂ V1|E1] + Pr[EA and A ⊂ V2′|E1]
For Pr[EA and A ⊂ V2′|E1], we further divide the event into sub-events according to the size of A and use the union bound again. Consider a set A ⊂ V2′ with |A| = i. The event EA conditioned on E1 can be overestimated by the event that Γ(A) ⊂ V1 and |Γ(A)| = i − 1. In other words, there is a set A′ ⊂ V1 with |A′| = i − 1 such that all vertices in V1\A′ only connect to vertices in V2\A. Thus, we have
Pr[EA and A ⊂ V2′|E1]
Since b > 5a, for 2 ≤ i ≤ (k + 1)/2, Equation (4.10) holds. It implies that
Pr[EA and A ⊂ V2′|E1] = o(1)
as k → ∞. Similarly, we can get a lower bound for b from the case of A ⊂ V1 and the bound is satisfied by b > 5a.
For Pr[E4|E3∧E1], that is, det(A) = 0, we treat each coefficient, randomly chosen from Zp, in the matrix K as a variable. Thus, det(K) is a mutlivariate function. Since there is a perfect matching in the induced graph H′, det(K) is not identically zero (i.e. det(K) 6≡ 0) and the degree of det(K) is k. We use the Schwartz-Zeppel Theorem:
Lemma 2. (Schwartz-Zeppel Theorem [50])
Let Q(x1, x2, . . . , xn) ∈ F [x1, x2, . . . , xn] be a multivariate polynomial of total degree d. Fix and finite set S and let r1, r2, . . . , rn be chosen independently and uniformly at random from S. Then
Pr[Q(r1, r2, . . . , rn) = 0|Q(x1, x2, . . . , xn) 6≡ 0] ≤ d
|S|
From the Schwartz-Zeppel Theorem, the probability that the randomly chosen coefficients make det(K) = 0 is no more than k/p, i.e., Pr[E4|E3 ∧ E1] ≤ k/p. Thus, we have
Pr[E2|E1] ≤ k/p + o(1)
and conclude the proof of Theorem 1.
For another setting for v and u, where n = ak and m = k, we have the following theorem.
Theorem 2. Assume that there are k messages, n storage servers, and m key servers, where n = ak for a fixed constant a > 1 and m = t = k > 1.
For v = b1ln k, u = b2ln k, b1 > 5a and b2 > 4 + 3/ ln a, the probability of a successful retrieval is at least 1 − k/p − o(1), where p is the size of the used group.
Proof. By the proof of Theorem 1, we analyze two events E1and E2 similarly.
We have
as k → ∞. Therefore, the probability of a successful retrieval is
1 − Pr[E1] − Pr[E2|E1] Pr[E1] ≥ 1 − k/p − o(1).
In our settings, if we increase the value of u, the value of v can be de-creased while keeping the same probability of the event of a successful data retrieval. Although a smaller v makes that a storage server contains less information on average. A higher u value makes more storage servers are queried by key servers. As a result, the probability of the event of a success data retrieval may remain.