• 沒有找到結果。

γh + h2− h2

2 − αh − hδ1(log z) − hδ10(log z)

+ O(z).

(2.11) Bernoulli model. From the two asymptotic expansions (2.9) and (2.11), we can observe that they satisfy the condition (I) of Theorem 7. To verify condition (O), we consider Y (z) = eX(z)ez and get

Y0(z) = Y (pz)eqz + Y (qz)epz+ zez, Y (0) = 0.

Observe that the above equation can be represented as Y (z) =

Z z 0



Y (pw)eqw+ Y (qw)epw+ wew

 dw.

We can apply mathematical induction over increasing domains and get a bound for Y (z) = X(z)ee z (see [11] for more details), as needed to verify condition (O) of Theorem 7. In a similar manner we can handle eV (z) + eX(z)2. Thus we have the following theorem of the mean and the variance of the internal path length (see [10]):

Theorem 12 (Jacquet and Szpankowski). Consider a digital search tree built from n records under the asymmetric DST Bernoulli model. Then asymptotically the average value E[Ln] and the variance V[Ln] of the internal path length of the digital search tree become

E[Ln] =n h



log n + h2

2h + γ − 1 − α + δ0(log n)



+ o(n),

V[Ln] ∼c2n log n, (2.12)

where h = −p log p − q log q is the entropy of the alphabet, γ = 0.577 . . . is the Euler constant, h2 = p log2p + q log2q, and c2 = (h2 − h2)/h3, α is defined in (2.10) and δ0(log n) is a fluctuating function for log p/ log q rational with small amplitude, and zero otherwise.

2.4 B-DSTs

Now we consider a b-DST, which is similar to the DST but now up to b records are stored in the nodes (the bucket capacity is b). The random model is as before. Flajolet and

Richmond [4] devised a method to give the average size of a digital search tree under the symmetric model. Hubalek [8] further developed the approach by Flajolet and Richmond to give the mean and variance of the internal path length of a symmetric b-DST.

From now on we fix the capacity b as an integer, and consider a b-DST built from n records (n ≥ 0). Let Ln be the internal path length of a symmetric b-DST built from n records. Since we know that the first b records are stored in the root, thus the corresponding probability generating functions Fn(z) = E[zLn] satisfy for n ≥ 0

Fn+b(z) = zn Again similar as before, we first investigate the general recurrence:

xn+b= an+b+ 21−n

One of the innovations in [4] is to consider the ordinary generating function. If we set the ordinary generating function X(z) =P

n≥0xnzn and A(z) = P

n≥0anzn with respect to the sequences (xn) and (an), we derive the following lemma.

Lemma 8. The generating function X(z) is given by X(z) = 1−z1 X(e 1−zz ), where eX(z) satisfies

(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(e z

2) (2.14)

and eA(z) = 1+z1 A(1+zz ).

Proof. Consider the Poisson transform x(z) ande ea(z) of the sequences (xn) and (an), respectively. Then, we obtain for the coefficients exn= n![zn]x(z) ande ean = n![zn]ea(z) From the equivalent relations (similar to the sequence (an) and (ean))

xn=

we have summing over n we obtain the relation

(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(z/2).e

Thus, we obtain the harmonic sum Φ(t) = P

j≥02jP (2ˆ jt)/φ(2jt)b, where ˆP (t) = (1 + t)bA(t). Since φ(ˆ 2t)b = 1 + bt + O(t2) (the Taylor expansion at 0), it suffices to know the asymptotic behavior of Φ(t) whose Mellin transform is given by

Φ∗(s) = 1

1 − 21−s · ˆP (t) φ(t)b

∗

(s). (2.19)

Now, we will turn to the mean. From (2.13) and Lemma 8:

(1 + z)bF (z) = ze b+1+ 2zbF (z/2).e Using Remark1 one has

F (t) =φˆ

From the integral relationR

0 log(1 + z)zs−1dz = s sin πsπ for <(s) ∈ h−1, 0i, we have log φ(t) =X

j≥0

log(1 + 2−jt)

= 1 2πi

Z 1/2+i∞

1/2−i∞

π

(1 − 2s)s sin πst−sds

∼ log2t

2 log 2 + log t

2 , (2.21)

uniformly for |t| → ∞ in the linear cone Lθ for any fixed θ ∈ (0, π). Thus, φ(t)−b =

( 1 − 2bt + O(t2), t → 0,

O exp(−(b/2 log 2) log2t), t → ∞, (2.22) in the cone. This guarantees the existence of the Mellin transform of H(t) which is

H∗(s) = 1

1 − 21−sI∗(s − 1) (<(s) > 1), (2.23) where

I∗(s) = Z

0

φ(t)−bts−1dt (2.24)

converges in the strip h0, ∞i.

Remark. I∗(s) is exponentially small as =(s) → ±∞ for <(s) > 0 [4]. Moreover, one can prove

I∗(s) = π

sin πsJ (s), with J (s) = 1 2πi

Z

H

1

φ(t)b(−t)s−1dt, (2.25) where H is a Hankel-type contour starting at +∞−0·i, turning around 0 clockwise before returning to +∞ + 0 · i. Flajolet and Richmond [4] also give the representation

J (s) = A0(2s) + (s − 1)A1(2s) + · · · + (s − 1)(s − 2) · · · (s − b + 1)Ab−1(2s), (2.26) where Ak(x)’s are entire functions, thus J (k) = 0, for all k ≥ 1. Furthermore, (2.22) implies that I∗(s) ∼ s−1 as s → 0 and I∗(s) ∼ −2b(s + 1)−1 as s → −1. Thus we can obtain the singular expansion of I∗(s).

From the above remark and (2.23), we know that H(s) has a double pole at s = 1 and simple poles at s = 1 + χk, where χk = 2kπi/L (k ∈ Z) with L = log 2. Applying the inversion formula

H(t) = 1 2πi

Z 3/2+i∞

3/2−i∞

H∗(s)t−sds,

we have the asymptotic expansion of H(t) as t → 0 (the remainder term is due to a simple

Remark. First we rewrite J0(0) =

Equations (2.20) and (2.27) give F (t) = −ˆ 1

and by the elementary substitution (2.16) we obtain the asymptotics of F (z). Finally, using Theorem 9we obtain the following theorem for the mean of symmetric b-DSTs.

Theorem 13 (Hubalek). The expected generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞

E[Ln] =n log2n + are analytic, periodic functions with mean 0 and period 1.

Variance. To compute the variance, we use the formula V[Ln] = sn − fn2 + fn where

Applying Lemma 8to (2.30a)–(2.30c) yields (1 + z)bU (z) =4ze b+1Fez Now we again apply Remark 1 to (2.31a) to obtain the expression for ˆU (t) with

P (t) = 4tˆ −1F (2t) − 8 ˆˆ F0(2t) − 8t ˆF0(2t)

then T (x) is a harmonic sum with Mellin transform T ∗(s) = 1

1 − 2s−1 π

sin πs (s ∈ h0, 1i),

and M[T (x) − 2; s] = T ∗(s) for s ∈ h−1, 0i. The Mellin transform of Υ(t) is Υ∗(s) = s23−sH∗(s − 1) − 4bΥ∗0(s) − b23−sH∗(s) − 4bΥ∗1(s) + s22−sH∗(s)

for s ∈ h2, ∞i, where Υ∗0(s) = M[(T (t) − 2)H(2t); s] and Υ∗1(s) = M[T (t)H(2t); s] exist for s ∈ h0, ∞i. For asymptotic analysis of Φ∗(s), we have to take Υ∗0(1) and Υ∗1(1) into account. One of the innovations in [8] is the use of the Mellin convolution formula.

Remark. The Mellin’s convolution formula is M[F (t) · G(t); s] = 1

2πi

Z c+i∞

c−i∞

F ∗(τ ) · G∗(s − τ ) dτ, (2.32) valid for c and s − c in the fundamental strip of F ∗ and G∗, respectively.

From (2.32), we obtain for j = 0, 1 respectively, Υ∗j(s) = 1

2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ + j) · 2−(s−τ )H∗(s − τ ) dτ.

First we compute Υ∗0(1) by splitting T ∗(τ )2−(1−τ )H∗(1 − τ ) = π

sin πτ

2τ −1

(1 − 2τ −1)(1 − 2τ)I∗(0 − τ )

= − T ∗(τ + 1)I∗(0 − τ ) − T ∗(τ )I∗(0 − τ ).

Then the first part is 1 2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ + 1)I∗(0 − τ ) dτ =M[tT (t)I(t); s = 0]

= −1

bM[I0(t); s = 0] = 1 b, and the second part yields

1 2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ )I∗(0 − τ ) dτ =M[(T (t) − 2)I(t); s = 0]

=lims→0

n

− 1

bM[T (t)I(t); s] − 2I∗(s)o

=1

bJ0(−1) − 2J0(0) − 2.

Thus Υ∗0(1) = −1b1bJ0(−1) + 2J0(0) + 2. It is more difficult to compute Υ∗1(1), for which it can be proved that

Υ∗1(1) = − 1 4b

Z 0

t−1Λ(t)

φ(t)b dt ∼ −2 (b → ∞) with Λ(t) = 2P

j≥0j2−jt/(1 + 2−jt). Thus, we can manipulate the expansion for U (z) as z → 1 similar as for F (z) and get the asymptotics of un as n → ∞.

The asymptotic of vn is simple. Again applying Remark 1to (2.31b), we obtain ˆV (t) with ˆP (t) = 2t−2 and Φ∗V(s) = 2I∗(s − 2)/(1 − 21−s). We immediately get the asymptotics of vn as n → ∞ from the properties of I∗(s).

Because of the appearance of the “binomial convolution” (2.31d), it is non-trivial to apply the same method to (2.31c). But, since the exponential generating function m(z) =e P

N ≥0meNzN/N ! satisfies m(z) = ee f (z/2)2, it can be proved that M ∗(s) = 2ˆ −s· 1

2πi

Z 3/2+i∞

3/2−i∞

s τ



F ∗(τ ) ˆˆ F ∗(s − τ ) dτ, (2.33)

where sτ = Γ(1 + s)/Γ(1 + τ )Γ(1 + s − τ ) is the complex binomial coefficient. Next, from the singular expansions of ˆF and the Taylor series of complex binomial coefficients, we obtain the asymptotics of ˆM ∗(s) as s → 2. Similarly, one treats the case s → 1.

From (2.31c) we have ˆW (t) = φ(t/2)bΦW(t), where ΦW(t) = 2P

j≥02jP (2ˆ jt) with P (t) = ˆˆ M (t)I(t). Presupposing some properties of ˆP , then Φ∗(s) = 2 ˆP ∗(s)/(1 − 21−s) where

P ∗(s) =ˆ 1 2πi

Z 1/2+i∞

1/2−i∞

I∗(τ ) ˆM ∗(s − τ ) dτ,

for s ∈ h52, 2b + 52i. Now shifting the contour to the left yields the analytic continuation P ∗(s) = ˆˆ M ∗(s) − 2b ˆM ∗(s + 1) + 1

2πi

Z −3/2+i∞

−3/2−i∞

I∗(τ ) ˆM ∗(s − τ ) dτ.

in s ∈ h12, 2b + 12i. Thus we get the Laurent series of ˆP ∗(s) as s → 2 and s → 1. After hard calculating, we obtain the asymptotics of wn. Overall, the following theorem for the variance of the internal path length over a b-DST holds:

Theorem 14 (Hubalek). The variance of the generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞,

V[Ln] =

C + δ(log n)

n + O(log2n), (2.34)

where

The constants and functions are defined in [8].

Remark. Hubalek gives the following values of C for b = 1, . . . , 5.

b 1 2 3 4 5

C 0.26600 0.13285 0.08883 0.07032 0.06109 Later on we will see that most of the digits are incorrect.

相關文件