B-DSTs - 數位搜尋樹的機率演算分析

γh + h₂− h²

2 − αh − hδ₁(log z) − hδ₁⁰(log z)

+ O(z).

(2.11) Bernoulli model. From the two asymptotic expansions (2.9) and (2.11), we can observe that they satisfy the condition (I) of Theorem 7. To verify condition (O), we consider Y (z) = eX(z)e^z and get

Y⁰(z) = Y (pz)e^qz + Y (qz)e^pz+ ze^z, Y (0) = 0.

Observe that the above equation can be represented as Y (z) =

Z z 0

Y (pw)e^qw+ Y (qw)e^pw+ we^w

dw.

We can apply mathematical induction over increasing domains and get a bound for Y (z) = X(z)ee ^z (see [11] for more details), as needed to verify condition (O) of Theorem 7. In a similar manner we can handle eV (z) + eX(z)². Thus we have the following theorem of the mean and the variance of the internal path length (see [10]):

Theorem 12 (Jacquet and Szpankowski). Consider a digital search tree built from n records under the asymmetric DST Bernoulli model. Then asymptotically the average value E[Ln] and the variance V[Ln] of the internal path length of the digital search tree become

E[Ln] =n h

log n + h₂

2h + γ − 1 − α + δ₀(log n)

+ o(n),

V[Ln] ∼c₂n log n, (2.12)

where h = −p log p − q log q is the entropy of the alphabet, γ = 0.577 . . . is the Euler constant, h₂ = p log²p + q log²q, and c₂ = (h₂ − h²)/h³, α is defined in (2.10) and δ0(log n) is a fluctuating function for log p/ log q rational with small amplitude, and zero otherwise.

2.4 B-DSTs

Now we consider a b-DST, which is similar to the DST but now up to b records are stored in the nodes (the bucket capacity is b). The random model is as before. Flajolet and

Richmond [4] devised a method to give the average size of a digital search tree under the symmetric model. Hubalek [8] further developed the approach by Flajolet and Richmond to give the mean and variance of the internal path length of a symmetric b-DST.

From now on we fix the capacity b as an integer, and consider a b-DST built from n records (n ≥ 0). Let L_n be the internal path length of a symmetric b-DST built from n records. Since we know that the first b records are stored in the root, thus the corresponding probability generating functions F_n(z) = E[z^Lⁿ] satisfy for n ≥ 0

F_n+b(z) = zⁿ Again similar as before, we first investigate the general recurrence:

xn+b= an+b+ 2¹⁻ⁿ

One of the innovations in [4] is to consider the ordinary generating function. If we set the ordinary generating function X(z) =P

n≥0xnzⁿ and A(z) = P

n≥0anzⁿ with respect to the sequences (x_n) and (a_n), we derive the following lemma.

Lemma 8. The generating function X(z) is given by X(z) = _1−z¹ X(e _1−z^z ), where eX(z) satisfies

(1 + z)^bX(z) = (1 + z)e ^bA(z) + 2ze ^bX(e z

2) (2.14)

and eA(z) = _1+z¹ A(_1+z^z ).

Proof. Consider the Poisson transform x(z) ande ea(z) of the sequences (x_n) and (a_n), respectively. Then, we obtain for the coefficients ex_n= n![zⁿ]x(z) ande ea_n = n![zⁿ]ea(z) From the equivalent relations (similar to the sequence (a_n) and (ea_n))

x_n=

we have summing over n we obtain the relation

(1 + z)^bX(z) = (1 + z)e ^bA(z) + 2ze ^bX(z/2).e

Thus, we obtain the harmonic sum Φ(t) = P

j≥02^jP (2ˆ ^jt)/φ(2^jt)^b, where ˆP (t) = (1 + t)^bA(t). Since φ(ˆ ₂^t)^b = 1 + bt + O(t²) (the Taylor expansion at 0), it suffices to know the asymptotic behavior of Φ(t) whose Mellin transform is given by

Φ∗(s) = 1

1 − 2^1−s · ˆP (t) φ(t)^b

∗

(s). (2.19)

Now, we will turn to the mean. From (2.13) and Lemma 8:

(1 + z)^bF (z) = ze ^b+1+ 2z^bF (z/2).e Using Remark1 one has

F (t) =φˆ

From the integral relationR∞

0 log(1 + z)z^s−1dz = _{s sin πs}^π for <(s) ∈ h−1, 0i, we have log φ(t) =X

j≥0

log(1 + 2^−jt)

= 1 2πi

Z 1/2+i∞

1/2−i∞

(1 − 2^s)s sin πst^−sds

∼ log²t

2 log 2 + log t

2 , (2.21)

uniformly for |t| → ∞ in the linear cone L_θ for any fixed θ ∈ (0, π). Thus, φ(t)^−b =

( 1 − 2bt + O(t²), t → 0,

O exp(−(b/2 log 2) log²t), t → ∞, (2.22) in the cone. This guarantees the existence of the Mellin transform of H(t) which is

H∗(s) = 1

1 − 2^1−sI∗(s − 1) (<(s) > 1), (2.23) where

I∗(s) = Z ∞

φ(t)^−bt^s−1dt (2.24)

converges in the strip h0, ∞i.

Remark. I∗(s) is exponentially small as =(s) → ±∞ for <(s) > 0 [4]. Moreover, one can prove

I∗(s) = π

sin πsJ (s), with J (s) = 1 2πi

φ(t)^b(−t)^s−1dt, (2.25) where H is a Hankel-type contour starting at +∞−0·i, turning around 0 clockwise before returning to +∞ + 0 · i. Flajolet and Richmond [4] also give the representation

J (s) = A0(2^s) + (s − 1)A1(2^s) + · · · + (s − 1)(s − 2) · · · (s − b + 1)Ab−1(2^s), (2.26) where A_k(x)’s are entire functions, thus J (k) = 0, for all k ≥ 1. Furthermore, (2.22) implies that I∗(s) ∼ s⁻¹ as s → 0 and I∗(s) ∼ −2b(s + 1)⁻¹ as s → −1. Thus we can obtain the singular expansion of I∗(s).

From the above remark and (2.23), we know that H(s) has a double pole at s = 1 and simple poles at s = 1 + χ_k, where χ_k = 2kπi/L (k ∈ Z) with L = log 2. Applying the inversion formula

H(t) = 1 2πi

Z 3/2+i∞

3/2−i∞

H∗(s)t^−sds,

we have the asymptotic expansion of H(t) as t → 0 (the remainder term is due to a simple

Remark. First we rewrite J⁰(0) =

Equations (2.20) and (2.27) give F (t) = −ˆ 1

and by the elementary substitution (2.16) we obtain the asymptotics of F (z). Finally, using Theorem 9we obtain the following theorem for the mean of symmetric b-DSTs.

Theorem 13 (Hubalek). The expected generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞

E[Lⁿ] =n log₂n + are analytic, periodic functions with mean 0 and period 1.

Variance. To compute the variance, we use the formula V[Ln] = s_n − f_n² + f_n where

Applying Lemma 8to (2.30a)–(2.30c) yields (1 + z)^bU (z) =4ze ^b+1Fez Now we again apply Remark 1 to (2.31a) to obtain the expression for ˆU (t) with

P (t) = 4tˆ ⁻¹F (2t) − 8 ˆˆ F⁰(2t) − 8t ˆF⁰(2t)

then T (x) is a harmonic sum with Mellin transform T ∗(s) = 1

1 − 2^s−1 π

sin πs (s ∈ h0, 1i),

and M[T (x) − 2; s] = T ∗(s) for s ∈ h−1, 0i. The Mellin transform of Υ(t) is Υ∗(s) = s2^3−sH∗(s − 1) − 4bΥ∗₀(s) − b2^3−sH∗(s) − 4bΥ∗₁(s) + s2^2−sH∗(s)

for s ∈ h2, ∞i, where Υ∗₀(s) = M[(T (t) − 2)H(2t); s] and Υ∗₁(s) = M[T (t)H(2t); s] exist for s ∈ h0, ∞i. For asymptotic analysis of Φ∗(s), we have to take Υ∗₀(1) and Υ∗₁(1) into account. One of the innovations in [8] is the use of the Mellin convolution formula.

Remark. The Mellin’s convolution formula is M[F (t) · G(t); s] = 1

2πi

Z c+i∞

c−i∞

F ∗(τ ) · G∗(s − τ ) dτ, (2.32) valid for c and s − c in the fundamental strip of F ∗ and G∗, respectively.

From (2.32), we obtain for j = 0, 1 respectively, Υ∗_j(s) = 1

2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ + j) · 2^{−(s−τ )}H∗(s − τ ) dτ.

First we compute Υ∗₀(1) by splitting T ∗(τ )2^{−(1−τ )}H∗(1 − τ ) = π

sin πτ

2^{τ −1}

(1 − 2^{τ −1})(1 − 2^τ)I∗(0 − τ )

= − T ∗(τ + 1)I∗(0 − τ ) − T ∗(τ )I∗(0 − τ ).

Then the first part is 1 2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ + 1)I∗(0 − τ ) dτ =M[tT (t)I(t); s = 0]

= −1

bM[I⁰(t); s = 0] = 1 b, and the second part yields

1 2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ )I∗(0 − τ ) dτ =M[(T (t) − 2)I(t); s = 0]

=lims→0

− 1

bM[T (t)I(t); s] − 2I∗(s)o

bJ⁰(−1) − 2J⁰(0) − 2.

Thus Υ∗₀(1) = −¹_b−¹_bJ⁰(−1) + 2J⁰(0) + 2. It is more difficult to compute Υ∗₁(1), for which it can be proved that

Υ∗₁(1) = − 1 4b

Z ∞ 0

t⁻¹Λ(t)

φ(t)^b dt ∼ −2 (b → ∞) with Λ(t) = 2P

j≥0j2^−jt/(1 + 2^−jt). Thus, we can manipulate the expansion for U (z) as z → 1 similar as for F (z) and get the asymptotics of u_n as n → ∞.

The asymptotic of v_n is simple. Again applying Remark 1to (2.31b), we obtain ˆV (t) with ˆP (t) = 2t⁻² and Φ∗_V(s) = 2I∗(s − 2)/(1 − 2^1−s). We immediately get the asymptotics of v_n as n → ∞ from the properties of I∗(s).

Because of the appearance of the “binomial convolution” (2.31d), it is non-trivial to apply the same method to (2.31c). But, since the exponential generating function m(z) =e P

N ≥0me_Nz^N/N ! satisfies m(z) = ee f (z/2)², it can be proved that M ∗(s) = 2ˆ ^−s· 1

2πi

Z 3/2+i∞

3/2−i∞

s τ

F ∗(τ ) ˆˆ F ∗(s − τ ) dτ, (2.33)

where ^s_τ = Γ(1 + s)/Γ(1 + τ )Γ(1 + s − τ ) is the complex binomial coefficient. Next, from the singular expansions of ˆF and the Taylor series of complex binomial coefficients, we obtain the asymptotics of ˆM ∗(s) as s → 2. Similarly, one treats the case s → 1.

From (2.31c) we have ˆW (t) = φ(t/2)^bΦ_W(t), where Φ_W(t) = 2P

j≥02^jP (2ˆ ^jt) with P (t) = ˆˆ M (t)I(t). Presupposing some properties of ˆP , then Φ∗(s) = 2 ˆP ∗(s)/(1 − 2^1−s) where

P ∗(s) =ˆ 1 2πi

Z 1/2+i∞

1/2−i∞

I∗(τ ) ˆM ∗(s − τ ) dτ,

for s ∈ h⁵₂, 2b + ⁵₂i. Now shifting the contour to the left yields the analytic continuation P ∗(s) = ˆˆ M ∗(s) − 2b ˆM ∗(s + 1) + 1

2πi

Z −3/2+i∞

−3/2−i∞

I∗(τ ) ˆM ∗(s − τ ) dτ.

in s ∈ h¹₂, 2b + ¹₂i. Thus we get the Laurent series of ˆP ∗(s) as s → 2 and s → 1. After hard calculating, we obtain the asymptotics of w_n. Overall, the following theorem for the variance of the internal path length over a b-DST holds:

Theorem 14 (Hubalek). The variance of the generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞,

V[Lⁿ] =

C + δ(log n)

n + O(log²n), (2.34)

where

The constants and functions are defined in [8].

Remark. Hubalek gives the following values of C for b = 1, . . . , 5.

b 1 2 3 4 5

C 0.26600 0.13285 0.08883 0.07032 0.06109 Later on we will see that most of the digits are incorrect.

在文檔中數位搜尋樹的機率演算分析 (頁 43-51)