γh + h2− h2
2 − αh − hδ1(log z) − hδ10(log z)
+ O(z).
(2.11) Bernoulli model. From the two asymptotic expansions (2.9) and (2.11), we can observe that they satisfy the condition (I) of Theorem 7. To verify condition (O), we consider Y (z) = eX(z)ez and get
Y0(z) = Y (pz)eqz + Y (qz)epz+ zez, Y (0) = 0.
Observe that the above equation can be represented as Y (z) =
Z z 0
Y (pw)eqw+ Y (qw)epw+ wew
dw.
We can apply mathematical induction over increasing domains and get a bound for Y (z) = X(z)ee z (see [11] for more details), as needed to verify condition (O) of Theorem 7. In a similar manner we can handle eV (z) + eX(z)2. Thus we have the following theorem of the mean and the variance of the internal path length (see [10]):
Theorem 12 (Jacquet and Szpankowski). Consider a digital search tree built from n records under the asymmetric DST Bernoulli model. Then asymptotically the average value E[Ln] and the variance V[Ln] of the internal path length of the digital search tree become
E[Ln] =n h
log n + h2
2h + γ − 1 − α + δ0(log n)
+ o(n),
V[Ln] ∼c2n log n, (2.12)
where h = −p log p − q log q is the entropy of the alphabet, γ = 0.577 . . . is the Euler constant, h2 = p log2p + q log2q, and c2 = (h2 − h2)/h3, α is defined in (2.10) and δ0(log n) is a fluctuating function for log p/ log q rational with small amplitude, and zero otherwise.
2.4 B-DSTs
Now we consider a b-DST, which is similar to the DST but now up to b records are stored in the nodes (the bucket capacity is b). The random model is as before. Flajolet and
Richmond [4] devised a method to give the average size of a digital search tree under the symmetric model. Hubalek [8] further developed the approach by Flajolet and Richmond to give the mean and variance of the internal path length of a symmetric b-DST.
From now on we fix the capacity b as an integer, and consider a b-DST built from n records (n ≥ 0). Let Ln be the internal path length of a symmetric b-DST built from n records. Since we know that the first b records are stored in the root, thus the corresponding probability generating functions Fn(z) = E[zLn] satisfy for n ≥ 0
Fn+b(z) = zn Again similar as before, we first investigate the general recurrence:
xn+b= an+b+ 21−n
One of the innovations in [4] is to consider the ordinary generating function. If we set the ordinary generating function X(z) =P
n≥0xnzn and A(z) = P
n≥0anzn with respect to the sequences (xn) and (an), we derive the following lemma.
Lemma 8. The generating function X(z) is given by X(z) = 1−z1 X(e 1−zz ), where eX(z) satisfies
(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(e z
2) (2.14)
and eA(z) = 1+z1 A(1+zz ).
Proof. Consider the Poisson transform x(z) ande ea(z) of the sequences (xn) and (an), respectively. Then, we obtain for the coefficients exn= n![zn]x(z) ande ean = n![zn]ea(z) From the equivalent relations (similar to the sequence (an) and (ean))
xn=
we have summing over n we obtain the relation
(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(z/2).e
Thus, we obtain the harmonic sum Φ(t) = P
j≥02jP (2ˆ jt)/φ(2jt)b, where ˆP (t) = (1 + t)bA(t). Since φ(ˆ 2t)b = 1 + bt + O(t2) (the Taylor expansion at 0), it suffices to know the asymptotic behavior of Φ(t) whose Mellin transform is given by
Φ∗(s) = 1
1 − 21−s · ˆP (t) φ(t)b
∗
(s). (2.19)
Now, we will turn to the mean. From (2.13) and Lemma 8:
(1 + z)bF (z) = ze b+1+ 2zbF (z/2).e Using Remark1 one has
F (t) =φˆ
From the integral relationR∞
0 log(1 + z)zs−1dz = s sin πsπ for <(s) ∈ h−1, 0i, we have log φ(t) =X
j≥0
log(1 + 2−jt)
= 1 2πi
Z 1/2+i∞
1/2−i∞
π
(1 − 2s)s sin πst−sds
∼ log2t
2 log 2 + log t
2 , (2.21)
uniformly for |t| → ∞ in the linear cone Lθ for any fixed θ ∈ (0, π). Thus, φ(t)−b =
( 1 − 2bt + O(t2), t → 0,
O exp(−(b/2 log 2) log2t), t → ∞, (2.22) in the cone. This guarantees the existence of the Mellin transform of H(t) which is
H∗(s) = 1
1 − 21−sI∗(s − 1) (<(s) > 1), (2.23) where
I∗(s) = Z ∞
0
φ(t)−bts−1dt (2.24)
converges in the strip h0, ∞i.
Remark. I∗(s) is exponentially small as =(s) → ±∞ for <(s) > 0 [4]. Moreover, one can prove
I∗(s) = π
sin πsJ (s), with J (s) = 1 2πi
Z
H
1
φ(t)b(−t)s−1dt, (2.25) where H is a Hankel-type contour starting at +∞−0·i, turning around 0 clockwise before returning to +∞ + 0 · i. Flajolet and Richmond [4] also give the representation
J (s) = A0(2s) + (s − 1)A1(2s) + · · · + (s − 1)(s − 2) · · · (s − b + 1)Ab−1(2s), (2.26) where Ak(x)’s are entire functions, thus J (k) = 0, for all k ≥ 1. Furthermore, (2.22) implies that I∗(s) ∼ s−1 as s → 0 and I∗(s) ∼ −2b(s + 1)−1 as s → −1. Thus we can obtain the singular expansion of I∗(s).
From the above remark and (2.23), we know that H(s) has a double pole at s = 1 and simple poles at s = 1 + χk, where χk = 2kπi/L (k ∈ Z) with L = log 2. Applying the inversion formula
H(t) = 1 2πi
Z 3/2+i∞
3/2−i∞
H∗(s)t−sds,
we have the asymptotic expansion of H(t) as t → 0 (the remainder term is due to a simple
Remark. First we rewrite J0(0) =
Equations (2.20) and (2.27) give F (t) = −ˆ 1
and by the elementary substitution (2.16) we obtain the asymptotics of F (z). Finally, using Theorem 9we obtain the following theorem for the mean of symmetric b-DSTs.
Theorem 13 (Hubalek). The expected generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞
E[Ln] =n log2n + are analytic, periodic functions with mean 0 and period 1.
Variance. To compute the variance, we use the formula V[Ln] = sn − fn2 + fn where
Applying Lemma 8to (2.30a)–(2.30c) yields (1 + z)bU (z) =4ze b+1Fez Now we again apply Remark 1 to (2.31a) to obtain the expression for ˆU (t) with
P (t) = 4tˆ −1F (2t) − 8 ˆˆ F0(2t) − 8t ˆF0(2t)
then T (x) is a harmonic sum with Mellin transform T ∗(s) = 1
1 − 2s−1 π
sin πs (s ∈ h0, 1i),
and M[T (x) − 2; s] = T ∗(s) for s ∈ h−1, 0i. The Mellin transform of Υ(t) is Υ∗(s) = s23−sH∗(s − 1) − 4bΥ∗0(s) − b23−sH∗(s) − 4bΥ∗1(s) + s22−sH∗(s)
for s ∈ h2, ∞i, where Υ∗0(s) = M[(T (t) − 2)H(2t); s] and Υ∗1(s) = M[T (t)H(2t); s] exist for s ∈ h0, ∞i. For asymptotic analysis of Φ∗(s), we have to take Υ∗0(1) and Υ∗1(1) into account. One of the innovations in [8] is the use of the Mellin convolution formula.
Remark. The Mellin’s convolution formula is M[F (t) · G(t); s] = 1
2πi
Z c+i∞
c−i∞
F ∗(τ ) · G∗(s − τ ) dτ, (2.32) valid for c and s − c in the fundamental strip of F ∗ and G∗, respectively.
From (2.32), we obtain for j = 0, 1 respectively, Υ∗j(s) = 1
2πi
Z 1/2+i∞
1/2−i∞
T ∗(τ + j) · 2−(s−τ )H∗(s − τ ) dτ.
First we compute Υ∗0(1) by splitting T ∗(τ )2−(1−τ )H∗(1 − τ ) = π
sin πτ
2τ −1
(1 − 2τ −1)(1 − 2τ)I∗(0 − τ )
= − T ∗(τ + 1)I∗(0 − τ ) − T ∗(τ )I∗(0 − τ ).
Then the first part is 1 2πi
Z 1/2+i∞
1/2−i∞
T ∗(τ + 1)I∗(0 − τ ) dτ =M[tT (t)I(t); s = 0]
= −1
bM[I0(t); s = 0] = 1 b, and the second part yields
1 2πi
Z 1/2+i∞
1/2−i∞
T ∗(τ )I∗(0 − τ ) dτ =M[(T (t) − 2)I(t); s = 0]
=lims→0
n
− 1
bM[T (t)I(t); s] − 2I∗(s)o
=1
bJ0(−1) − 2J0(0) − 2.
Thus Υ∗0(1) = −1b−1bJ0(−1) + 2J0(0) + 2. It is more difficult to compute Υ∗1(1), for which it can be proved that
Υ∗1(1) = − 1 4b
Z ∞ 0
t−1Λ(t)
φ(t)b dt ∼ −2 (b → ∞) with Λ(t) = 2P
j≥0j2−jt/(1 + 2−jt). Thus, we can manipulate the expansion for U (z) as z → 1 similar as for F (z) and get the asymptotics of un as n → ∞.
The asymptotic of vn is simple. Again applying Remark 1to (2.31b), we obtain ˆV (t) with ˆP (t) = 2t−2 and Φ∗V(s) = 2I∗(s − 2)/(1 − 21−s). We immediately get the asymptotics of vn as n → ∞ from the properties of I∗(s).
Because of the appearance of the “binomial convolution” (2.31d), it is non-trivial to apply the same method to (2.31c). But, since the exponential generating function m(z) =e P
N ≥0meNzN/N ! satisfies m(z) = ee f (z/2)2, it can be proved that M ∗(s) = 2ˆ −s· 1
2πi
Z 3/2+i∞
3/2−i∞
s τ
F ∗(τ ) ˆˆ F ∗(s − τ ) dτ, (2.33)
where sτ = Γ(1 + s)/Γ(1 + τ )Γ(1 + s − τ ) is the complex binomial coefficient. Next, from the singular expansions of ˆF and the Taylor series of complex binomial coefficients, we obtain the asymptotics of ˆM ∗(s) as s → 2. Similarly, one treats the case s → 1.
From (2.31c) we have ˆW (t) = φ(t/2)bΦW(t), where ΦW(t) = 2P
j≥02jP (2ˆ jt) with P (t) = ˆˆ M (t)I(t). Presupposing some properties of ˆP , then Φ∗(s) = 2 ˆP ∗(s)/(1 − 21−s) where
P ∗(s) =ˆ 1 2πi
Z 1/2+i∞
1/2−i∞
I∗(τ ) ˆM ∗(s − τ ) dτ,
for s ∈ h52, 2b + 52i. Now shifting the contour to the left yields the analytic continuation P ∗(s) = ˆˆ M ∗(s) − 2b ˆM ∗(s + 1) + 1
2πi
Z −3/2+i∞
−3/2−i∞
I∗(τ ) ˆM ∗(s − τ ) dτ.
in s ∈ h12, 2b + 12i. Thus we get the Laurent series of ˆP ∗(s) as s → 2 and s → 1. After hard calculating, we obtain the asymptotics of wn. Overall, the following theorem for the variance of the internal path length over a b-DST holds:
Theorem 14 (Hubalek). The variance of the generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞,
V[Ln] =
C + δ(log n)
n + O(log2n), (2.34)
where
The constants and functions are defined in [8].
Remark. Hubalek gives the following values of C for b = 1, . . . , 5.
b 1 2 3 4 5
C 0.26600 0.13285 0.08883 0.07032 0.06109 Later on we will see that most of the digits are incorrect.