Now we are going to introduce results on other parameters which have been studied for digital search trees of size n. Our main emphasis will again be on mean and variance. First let us fix some notation: set L = log 2, α is given in Theorem10, β =P
k≥1(2k− 1)−2 = 0.788343 · · · , γ is the Euler number, the constants h, h2 and c2 are given in Theorem 12, and ˆh2 = p2log p + q2log q.
Depth. The depth of a node is the number of nodes on the path from root to the selected node. Let Dn be the depth of a randomly selected node in a digital search tree.
Knuth [15] first gave an approach to the mean value of the symmetric DSTs which later was improved by Flajolet and Sedgewick [5]. Kirschenhofer and Prodinger used Flajolet and Sedgewick’s approach to give the variance in the symmetric DSTs [12]. Szpankowski used a method which is also similar to Flajolet and Sedgewick to give all moments in the asymmetric DSTs [22].
Theorem 15. 1. The asymptotics of the mean E[Dn] and the variance V[Dn] of the depth of the symmetric digital search tree built over n records are
E[Dn] = log2n +γ − 1 L +1
2 − α + σ1(n) + O n−1/2, V[Dn] = 1
12+ π2 6L2 + 1
L2 − α − β + σ2(n), where σ1(x) and σ2(x) are small fluctuating functions in [12].
2. Under the asymmetric Bernoulli model, the mean and the variance become E[Dn] =1
h n
log n + γ − 1 + h2
2h − θ + σ3(n)o
+ O n−1/2, V[Dn] =c2log n + C + σ4(z) + O(n−1log2n),
where C is a constant, σ3(x) and σ4(x) are small fluctuating functions in [22]
As for limit results, Louchard used a probabilistic technique to give the asymptotic distributions of the depth in a symmetric DST [17]. Moreover, Louchard and Szpankowski proved the normal limiting distribution of the depth in the asymmetric DSTs [19]. Fi-nally, in the generalized b-DSTs, Louchard, Szpankowski and Tang derived the mean, the variance, and the limiting distribution for the symmetric and asymmetric b-DSTs [20].
Distance. The distance between two nodes is the number of nodes on the path connect-ing the selected two nodes. Let dn be the distance between two randomly selected nodes in a digital search tree. Aguech, Lasmar and Mahmoud used the methods developed for the depth to determine the moments and to obtain the limit law of the distance in a DST [1]. We only give their results concerning mean and variance.
Theorem 16. 1. Consider an asymmetric digital search tree built from n records. Then asymptotically the average value E[dn] and the variance V[dn] of the distance between two random nodes in the digital search tree become
E[dn] =2
hlog n + 1 h
ˆh2 pq + h2
h − 2(1 − γ) + log(pq) − 2Lα + 2 − 2δq(n) + O n−0.49999,
V[dn] =2c2log n + O(1),
where δq(n) is a small fluctuating function in [1].
2. Now, consider the digital search tree under the symmetric Bernoulli model. Then
E[dn] =2 log2n − 1 +2(γ − 1)
L − α − 2δ1
2(n) + O n−0.49999, V[dn] =6 + π2
3L3 +22
3 − 2(α + β) +4(γ − 1) L δ1
2(n) − 2δ21 2
(n) + 4
L
δ(n) + O nˆ −0.49999,
where ˆδ(x) is another small fluctuating function in [1].
External-internal nodes. A node with both links null is called an external-internal node. Knuth gave the open question in [15] to analyze the number of such nodes in random DSTs (Prodinger showed that Knuth could have solved it himself in [21]). This question was solved by Flajolet and Sedgewick who gave the mean value in a symmetric digital search tree. Moreover, the variance in the symmetric DSTs was derived by Kirschenhofer and Prodinger [13]. Since the latter result is very messy we just give the result for the mean.
Theorem 17. The average number of external-internal nodes in a symmetric digital search tree built from n records is
n
β + 1 − 1 Q∞
1
L+ α2− α + δ(n)
+ O(n1/2), where Q∞=Q
k≥1(1 − 2−j) = 0.288788 . . . , β =
∞
X
k=1
k · 2k(k−1)/2
1 · 3 · 5 · · · (2k− 1)·Xk
j=1
1 2j − 1
= 7.74313 · · · ,
and δ(x) is a small fluctuating function in [5]
As for b-DSTs, mean, variance and limit laws in the symmetric b-DSTs were derived in Hubalek, Hwang, et al. [9].
The Size. The size of a tree is the number of nonempty nodes. For a classical DST, the size is equal to the number of nodes, but this does not hold for the b-DSTs. Flajolet and Richmond gave the expected value of the size of a symmetric b-DST [4]. Moreover, variance and limit distributions were derived in Hubalek, Hwang, et al. [9]. Again we just give the result for the mean.
Theorem 18. The expected number of nonempty nodes in a symmetric b-DST built from n records satisfies
n(q0+ S(n)) + O(n1/2) where
q0 = 1 L
Z ∞ 0
(1 + t)b−1 φ(t)b dt,
where φ(t) is defined in (2.17), S(x) is a periodic function with mean 0 and the following are few values of the leading constant q0:
b 2 3 4 5 10
q0 0.5747 0.4069 0.3159 0.2585 0.1360
Chapter 3
New Method for Internal Path Length
In this chapter, we explain a new method which will appear in a forthcoming paper of Fuchs, Hwang, and Zacharovas to improve the analysis of the internal path length of symmetric b-DSTs. Moreover, we will use the method to derive some exact and asymptotic results.
3.1 Introduction
Let Lnbe the internal path length of the b-DSTs built from n records. Let Pn(y) = E[eLny] be the moment generating function of Ln. Then Pj(y) = 1 with j ≤ b and
Pn+b(y) = eny2−n
n
X
j=0
n j
Pj(y)Pn−j(y) (n > 1).
Next, we define P (z, y) =P
n≥0 Pn(y)
n! zn. This gives
∂b
∂zbP (z, y) = Peyz 2 , y2
.
Now we consider the Poisson generating function eP (z, y) = e−zP (z, y). Then
b
X
j=0
∂j
∂zjP (z, y) = ee (ey−1)zPeeyz 2 , y2
.
Thus, if we set eP (z, y) = P
m≥0 fem(z)
m! ym, then we obtain the following relations for the Poisson transforms of the first two moments
b
X
j=0
b j
fe1(j)(z) =2 ef1(z/2) + z, (3.1)
b
X
j=0
b j
fe2(j)(z) =2 ef2(z/2) + 2 ef1(z/2)2+ 4z ef1(z/2) + 2z ef10(z/2) + z + z2 (3.2)
with initials efk(j)(0) = 0 for 0 ≤ j ≤ b and k = 1, 2.
New method. First, we consider the recurrence of the general type:
b
X
j=0
b j
fe(j)(z) =2 ef (z/2) + g(z), (3.3)
with initials ef(j)(0) = 0 for 0 ≤ j ≤ b. By using the Laplace transform, we can deduce a more simpler recurrence. More precisely, we denote the Laplace transform of f (z) by F (s) and obtain
b
X
j=0
b j
sjF (s) = 4F (2s) + G(s), (3.4)
where G(s) is the Laplace transform of g(z). Define ϕ(s) =Y
j≥1
(1 + 2−js)b (3.5)
and write
F (s) =ˆ F (s) ϕ(s). Then, we have
F (s) = 4 ˆˆ F (2s) + G(s) ϕ(2s), and by iteration
F (s) =ˆ X
j≥0
4j G(2js)
ϕ(2j+1s). (3.6)
Obviously, this is a harmonic sum. Therefore, we use the Mellin transform and get
From (2.22) (φ(s)b = ϕ(2s)), we know 1/ϕ(s) is very small at infinity. Thus, the Mellin transform G∗(w) is well-defined in <(w) > max{β + 1, 0} and from the inverse Mellin transform we have
as |s| → 0. Finally, by the inverse Laplace transform, we get
f (z) =e 1 From this we obtain the asymptotics of fn via de-poissonization.
Mean. From (3.1), we know that for the expected internal path length we have g(z) = z, and by the process above we get
G∗(w) =
Thus, using Theorem7 we obtain E[Ln] = ef1(n) + O(1)
This coincides with the expansion obtained in Theorem13.
Variance. To compute the variance, we consider the function
V (z) := ee f2(z) − ef1(z)2− z ef10(z)2. (3.10) The advantage of using this function is
V (n) = ee f2(n) − ef1(n)2− n ef10(n)2,
whose right hand side corresponds to the right hand of (1.23) in Theorem 7. From (3.2), V (z) satisfies the general type (3.3)e
b
Remark. (1) Note that from (3.9), we have g(z) = b Now, by the same process as for the mean, we get
V[Ln] = eV (n) + O(1) which is as same as (2.34), where
G∗(2)
Note that the last expression is much easier than the corresponding expression in Theorem 14. In particular, we can use Maple to obtain the values for small b (see Table 3.1). We will do this in Section 3.3.
Table 3.1: Some values of the leading constant G∗(2)/ log 2.
b 1 2 3 4 5
G∗(2)/ log 2 0.26600 0.13260 0.09004 0.06958 0.05781