數位搜尋樹的機率演算分析

(1)

國

立

交

通

大

學

應用數學系

碩

士

論

文

數位搜尋樹的機率演算分析

Probabilistic Analysis of Digital Search Trees

– Old and New Results

研究生：曾柏翰

指導教授：符麥可教授

(2)

數位搜尋樹的機率演算分析

Probabilistic Analysis of Digital Search Trees

– Old and New Results

研究生：曾柏翰 Student：Po-Han Tseng

指導教授：符麥可 Advisor：Michael Fuchs

國立交通大學

應用數學系

碩士論文

A Thesis

Submitted to Department of Applied Mathematics College of Electrical Engineering and Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Applied Mathematics

June 2009

Hsinchu, Taiwan, Republic of China

(3)

Master Thesis

Probabilistic Analysis of Digital

Search Trees - Old and New Results

Po-Han Tseng

Department of Applied Mathematics,

National Chiao Tung University

(4)

摘要

數位搜尋樹(digital search trees, DSTs for short)與桶型數位搜

尋樹(bucket DSTs,每個的節點最多可儲存 b 筆資料,b-DSTs for short)為

電腦科學中基本的資料結構。這兩種資料結構由具有 0-1 數列的儲存資料

所組成。此篇論文中，我們考慮隨機生成的 DSTs。

在這十年來，幾乎所有關於隨機 DSTs 的重要參數(parameters)都有研

究結果出現。如：深度 (Depth) ，距離 (Distance) ，外部 - 內部節點

(External-internal nodes)，內節點路徑長度(Internal path length)和

大小(Size)。這些研究結果中有用到許多的分析方法，其中最主要都在解

析組合的範疇內。

在此論文中，我們主要著重於探討 DSTs 的內節點路徑長度。我們將介

紹近年發展出來之研究結果，與其使用到之分析技術。除此之外，還會介

紹一個全新的方法，此法將由 Fuchs、Hwang 和 Zacharovas 在以後的研究

中發表。此方法將會改進對 b-DST 上內節點路徑長度的分析。

這份論文的主要目地有兩個：第一，我們給出近年來關於 DSTs 之內節

點路徑長度的分析方法與研究結果，和其他參數的研究結果整理。此外我

們也給了一些分析技術上的改進。第二，我們提出一個全新的分析方法，

也得到一個對於 b-DSTs 上內節點路徑長度更加簡單的結果。

(5)

二章為內節點路徑長度與其他參數的期望值(mean)與變異數(variance)之

研究結果整理。第三章中介紹新的方法並給出我們的主要結果。

(6)

Preface

Digital search trees (DSTs for short) and their generalizations such as bucket digital search trees (b-DSTs) are fundamental data structures in computer science. These trees are built from records whose keys consist of 0-1 strings. In this thesis, we will consider random DSTs which are obtained by assuming that the bits of the keys are randomly generated. Characteristic parameters of random DSTs are random variables and their analysis has attracted a lot of attention in recent decades. Examples of parameters considered in previous works include: the depth of a random node [5, 12, 15, 17, 18, 19, 20, 22], the distance of two random nodes [1], the number of external-internal nodes [5,9,15,21, 13], the internal path length [5, 8, 10, 14], and the size of the tree [4, 9]. For the analysis, several interesting methods have been proposed, most of them belonging to the field of analytic combinatorics.

In this thesis, we focus on the internal path length of DSTs. We will introduce the techniques which have been devised for the analysis of the internal path length. Moreover, we will give a new method, which will appear in a forthcoming work of Fuchs, Hwang, and Zacharovas, to improve the analysis of the internal path length of b-DSTs.

The purpose of this thesis is twofold. First, we want to give a self-contained survey of the techniques used in the analysis of DSTs and the results achieved. Here, we will mainly follow previous works, but also introduce some technical improvements. Secondly, we are going to use the new approach of Fuchs, Hwang, and Zacharovas mentioned above to obtain exact and numerical results concerning the leading constant in the asymptotic expansion of the variance. In particular, our results will simplify and improve previous results.

This thesis is organized as follows: in Chapter 1, we introduce the techniques which are of importance in the analysis of DSTs. In Chapter 2, we present results concerning mean value and variance of the internal path length and explain how they can be proved with the methods from Chapter 1. Moreover, we also give a short survey of results concerning other parameters. In Chapter 3, we introduce the new method and explain our new findings concerning the leading constant of the variance.

(7)

誌謝

首先，我最想要感謝的人，就是我的指導老師 Dr. Michael Fuchs。從

這篇論文開始動工前，Michael 與我花了很多時間研讀很多關於 Digital

Search Trees 的論文，慢慢的搞清楚到目前為止 DSTs 的研究結果與其中所

使用的方法。除此之外，在這篇論文的撰寫中，Michael 也給了我相當大的

協助與指導，讓這篇論文能夠達到讓初次接觸 DSTs 或是這個領域的人有系

統的學習。

另外，我也很感謝我的兩位口試委員，交大的陳秋媛教授與海大的程

華淮教授。他們兩位都給了我關於這篇論文許多的意見，讓這篇論文能更

加完善與嚴謹。

我當然沒有忘記 96 應數所的同學們，哈哈。雖然研究所的日子很難熬，

但我們還是一起走過來了。該畢業的終究會畢業，還沒畢業的，那只是老

闆還沒點頭而已，加油！

還有，不可或缺的，就是在我生命中給我最多也最豐富的家人們。讓

我在新竹用功時(雖然有時會偷懶)，還是給予我最大的支持。喔對，還有

我家那兩隻可愛的小狗，奶雞與都胖(雖然他們什麼都不懂，老是想著吃還

有去公園)。

最後，永遠別忘記那些曾出現在生命中的美麗。

(8)

List of Figures

2.1 Examples of generalized digital search trees for b = 1, 2, 3 built from 12 records. . . 25

(10)

List of Tables

1.1 Some common Mellin transforms. . . 6

1.2 Functional properties of Mellin transform. . . 7

1.3 Some Poisson transforms and their properties . . . 16

1.4 Some commonly functions and the asymptotic forms of their coefficients. . 22

(11)

Chapter 1 Some techniques

In this chapter, we collect some analytic techniques, such as Rice method [6] (in Section

1.1), Mellin transform [2] (in Section 1.2), Poisson transform [11] (in Section 1.3) and

singularity analysis [3] (in Section 1.4). These methods will be the main tools for deriving

our results in Chapter 2 and Chapter 3.

1.1 Rice Method

Rice method is fruitful for finding the asymptotic expansion of sums of the form

n X k=0 n k (−1)kf (k). (1.1)

The starting point is the integral representation:

Lemma 1. Let C be a positive oriented closed curve encircling the points 0, 1, · · · , n, and let f (z) be a function which is analytic with in C. Then, we have

n X k=0 n k (−1)kf (k) = (−1) n 2πi Z C f (z) n! z(z − 1) · · · (z − n)dz.

Proof. This follows by an application of the residue theorem: The integral equals 2πi times the sum of the residues of the simple poles at the points 0, 1, · · · , n. For each k, we have Res z=kf (z) n! z(z − 1) · · · (z − n) = (−1) n−k n! k!(n − k)!f (k).

(12)

Remark. The kernel of the integral could be written as n! z(z − 1) · · · (z − n) = Γ(n + 1)Γ(z − n) Γ(z + 1) = (−1) n−1_{B(n + 1, −z),}

where B(x, y) is the classical Beta function.

Remark. Sometimes the sum might be taken over the integers from n0, · · · , n. Then

Lemma1 still holds when C is changed to enclose just those points.

Rice method. Suppose we have an explicit sum of type (1.1). Then the Rice method

allows us to compute an asymptotic expansion by using the following steps:

Step 1. Extend fk which is defined only on the integers to an appropriate meromorphic

function f (z) which is analytic at the points 0, 1, · · · , n.

Step 2. Choose a suitable contour C which encircles the points 0, 1, · · · , n and consider the integral ∆ = (−1) n 2πi Z C f (z) n! z(z − 1) · · · (z − n)dz. Step 3. By the residue theorem we obtain

∆ = n X k=0 n k

(−1)kf (k) + {Contributions from the other_{poles inside the contour C.} }

Step 4. Estimate ∆.

To carry out Step 4 one often needs growths properties of f (z). Therefore, we give the following definition:

Definition 1. A function f (z) is said to be of polynomial growth in an unbounded domain Ω if it is analytic in Ω and satisfies

|f (z)| = O(|z|r_), _(1.2)

for some non-negative integer r as z → ∞ in Ω.

Remark. Suppose f (z) is of polynomial growth. Then, the integral Z

C

f (z) n!

z(z − 1) · · · (z − n)dz → 0

(13)

The following are two examples to demonstrate the Rice method. Example 1. Consider the sum

Sn= n X k=1 n k (−1)k k .

Step 1. f (z) = 1/z is obviously a suitable extension of the sequence 1/k.

Step 2. We choose the curve C to be a circle with radius larger than n centered at 0. Step 3. The kernel of ∆ has a double pole at 0, simple poles at 1, 2, · · · , n, and is analytic

everywhere else. Thus

∆ =Sn+ (−1)nRes z=0 h n! z2_{(z − 1) · · · (z − n)} i =Sn+ (−1)n hXn k=1 n! (z − 1) · · · (z − k)2_{· · · (z − n)} i z=0 =Sn+ n X k=1 1 k.

Step 4. Clearly, f (z) is of polynomial growth, thus ∆ converges to 0 as soon as C becomes large. Hence, we have −Sn= n X k=1 1 k = Hn = log n + γ + O(n −1 ),

where Hn are the harmonic numbers and γ = 0.57721 · · · is the Euler number. The

asymptotics of the harmonic numbers is well-known (see Example 5 in Section 1.2 for a

proof).

Example 2. Consider the sum

An= X k=2 n k (−1)kQk−2, n > 1. where Qn = Q 1≤j≤n(1 − 2 −j_).

(14)

Step 1. We introduce the function Q(x) = 1 − x 2 1 − x 4 1 −x 8 · · · .

Note that Qn = Q∞/Q(2−n) where Q∞ := Q(1) = limn→∞Qn = 0.288788 · · · ,

and Q∞/Q(2−z+2) is analytic on [2, ∞) which gives the appropriate extension.

Step 2. Take as C a large segment of the line <(s) = 1/2 closed to the right by a large semi-circle which encloses the points 2, 3, · · · , n.

Step 3. Note that the zeros of Q(2−z+2) all satisfy 2−z+j = 1 with j ≤ 1. Thus, the

kernel of ∆ has poles at 1 ± (2πik)/ log 2 (one double pole at k = 0 and single poles for all k with k 6= 0) inside C. To find the contribution at 1 we use Taylor expansion.

Here the following fact will turn out to be useful:

If G(z) =Q

k∈Rgk(z), then G0(z)/G(z) =

P

k∈Rg 0

k(z)/gk(z). From this it follows

that if F (z) =Q

j∈ R 1 − fj(z)

−1

for some index set R, then the Taylor series expansion of F at a, if it exists, is given by

F (z) = F (a)1 +X

j∈R

f_j0(a) 1 − fj(a)

(z − a) + O(z − a)2.

Consequently we obtain the series expansions n! z(z − 1) · · · (z − n) = 1 z(z − 1) Y 2 ≤ j ≤n (1 − z/j)−1 = n z − 1 1 + (Hn−1− 1)(z − 1) + O (z − 1)2 = n z − 1 + n(Hn−1− 1) + O(z − 1). And Q∞/Q(2−z+1) =Q∞ Y j<1 (1 − 2−z+j)−1 =1 − log 2X j<1 2j−1 1 − 2j−1(z − 1) + O(z − 1) 2 =1 − α log 2(z − 1) + O(z − 1)2,

(15)

where α = 1 + 1₃ + 1₇ + · · · . Thus, we obtain Q∞ Q(2−z+2₎ n! z(z − 1) · · · (z − n) = 1 1 − 2−z+1 Q∞ Q(2−z+1₎ n! z(z − 1) · · · (z − n) = 1 (z − 1) log 2+ 1 2+ O(z − 1) × (1 − α log 2(z − 1) + O(z − 1)2₎ × n z − 1+ n(Hn−1− 1) + O(z − 1) . The residue at z = 1 is the coefficient of 1/(z − 1) in the above product:

n log 2(Hn−1− 1) − n α − 1 2 = n log₂n + nγ − 1 log 2 − α + 1 2 + O(1). The poles at 1 ± 2πik/ log 2 with k 6= 0 add a small contribution δ(n) to the linear term [5], where

δ(n) = 1 log 2 X k6=0 Γ − 1 − 2kπi log 2 e2kπi log2n_.

Step 4. On the right semi-circle, ∆ converges to 0 as C becomes large since |Q−1(2−z+2)| = Y

j≤1

(1 − 2−|z|+j)−1= O(|z|0)

as |z| → ∞. On the left segment we have the bound O Z ∞ −∞ Γ(n + 1) Γ(n + 1/2 − iy)dy = O(n1/2). Thus we have An= n log2n + n γ − 1 log 2 − α + 1 2+ δ(n) + O(n1/2). (1.3)

1.2 Mellin Transform

The Mellin transform (Hjalmar Mellin 1854–1933, Finish mathematician) is the most popular transform in the analysis of algorithms.

Definition 2. Let f (x) be a continuous function over (0, ∞). Its Mellin transform f ∗(s) is defined by

f ∗(s) = M[f (x); s] =

Z ∞

0

(16)

Table 1.1: Some common Mellin transforms. f (x) f ∗(s) hα, βi e−x Γ(s) h0, +∞i e−x2 1₂Γ(1₂s) h0, +∞i 1 1+x π sin πs h0, 1i log(1 + x) _{s sin πs}π h−1, 0i H(x) ≡ 10<x<1 1_s h0, +∞i

xα(log x)kH(x) _(s+α)(−1)kk+1k! h−α, +∞i, k integer

Basic properties. The following lemma gives the conditions for the existence of the

Mellin transform of a given function f (x). Lemma 2. The conditions

f (x) =

x→0+ O(x

u_); _{f (x)} ₌

x→+∞ O(x v_),

when u > v, guarantee that f ∗(s) exists in the strip −u < <(s) < −v. Proof. From the decomposition

Z ∞ 0 f (x)xs−1dx ≤ Z 1 0 |f (x)|x<(s)−1dx + Z ∞ 1 |f (x)|x<(s)−1dx ≤ α Z 1 0 xu+<(s)−1dx + β Z ∞ 1 xv+<(s)−1dx,

where α, β are some constants. The first integral exists for u + <(s) > 0 and the second for v + <(s) < 0. Thus f ∗(s) exists in the strip −u < <(s) < −v.

Remark. From the above lemma we see that the domain of existence of a Mellin transform is a complex strip, and the largest one is called the fundamental strip. We introduce the notation hα, βi for the open strip of complex numbers s such that α < <(s) < β.

Table 1.1 presents some common Mellin transforms with their corresponding

funda-mental strips. These formulas are simple and easy to check.

Moreover, some basic transformation rules are given in Table1.2. These rules are also

(17)

Table 1.2: Functional properties of Mellin transform. f (x) f ∗(s) hα, βi F1 xνf (x) f ∗(s + v) hα − ν, β − νi Shift F2 f (xρ) 1_ρf ∗(s_ρ) hρα, ρβi ρ > 0 Multiple f (1/x) −f ∗(−s) h−β, −αi F3 f (µx) _µ1sf ∗(s) hα, βi µ > 0 P kλkf (µkx) ( P kλkµ−sk ) · f ∗(s) By linearity

F4 f (x) log x _dsdf ∗(s) hα, βi Differential

F5 Θf (x) −sf ∗(s) hα0, β0i Θ = x_dxd d dxf (x) −(s − 1)f ∗(s − 1) hα 0_{+ 1, β}0_{+ 1i} Rx 0 f (t) dt − 1 sf ∗(s + 1)

Inversion. We can see that the Mellin transform is closely related to the Fourier

trans-forms (as well as the Laplace transform): Let x = e−y and s = σ + it, we obtain

f ∗(s) = Z ∞ 0 f (x)xs−1dx = Z ∞ −∞

f (e−y)e−σye−itydy.

Thus the Mellin transform turns into a Fourier transform, and the inversion theorem for the Mellin transform follows from that for the Fourier transform.

Theorem 1. Let f (x) be continuous on (0, ∞) and assume that its Mellin transform has fundamental strip ha, bi. Then

f (x) = 1 2πi Z c+i∞ c−i∞ f ∗(s)x−sds, (1.4) where a < c < b.

Asymptotic properties. The usefulness of the Mellin transform comes from its

asymp-totic properties as we will see below. In particular we have two important results, namely, the direct and converse mapping theorem.

Before we can give these results, we give the notation of the singular expansion: For a meromorphic function φ(s) with poles in Ω, the singular expansion is

φ(s) X

k∈Ω

(18)

where 4k(s) is the Laurent expansion of φ around s = k up to at most O(1) term. For example, since 1 s(s − 1) = − 1 s − 1 + O(s) (s → 0), and 1 s(s − 1) = 1 s − 1 − 1 + O((s − 1)) (s → 1), then we write 1 s(s − 1) −1 s − 1 s=0 + 1 s − 1 − 1 s=1

for the singular expansion of 1/s(s − 1).

The prototype of the direct mapping is the function e−x: we know its Taylor expansion

at 0 is e−x= ∞ X k=0 (−1)k k! x k , and its Mellin transform

Z ∞

0

e−xxs−1dx = Γ(s) = Γ(s + k + 1)

s(s + 1)(s + 2) · · · (s + k).

That means Γ(s) has poles at the points s = −k with positive integer k, and hence we have the singular expansion

Γ(s) ∞ X k=0 (−1)k k! 1 s + k (s ∈ C).

We can observe that one can map the Taylor expansion to coincide with the singular expansion by the rule

xk 7→ 1

s + k. In fact, this is a general phenomenon.

Theorem 2. Let f (x) be continuous with its Mellin transform f ∗(s) having nonempty fundamental strip hα, βi.

(i) [Asymptotics for x → 0] Assume that f (x) has the following asymptotic expansion as x → 0

f (x) =X

ξ,k

(19)

where −γ < −ξ ≤ α and k is non-negative. Then f ∗(s) is continuable to the strip h−γ, βi and f ∗(s) X ξ,k cξ,k (−1)k_k! (s + ξ)k+1 (s ∈ h−γ, βi). (1.6)

(ii) [Asymptotics for x → ∞] Assume that f (x) has the asymptotic expansion of form

(1.5) where now β ≤ −ξ < −γ as x → ∞. Then f ∗(s) is continuable to the strip

hα, −γi and f ∗(s) −X ξ,k cξ,k (−1)k_k! (s + ξ)k+1 (s ∈ hα, −γi). (1.7)

Proof. Since M[f (1/x); s] = −M[f (x); −s], we only need to prove the case x → 0. By assumption, the function

g(x) = f (x) −X

ξ,k

cξ,kxξ(log x)k

is O(xγ). In the fundamental strip we also have

f ∗(s) = Z 1 0 g(x)xs−1dx + Z 1 0 X ξ,k cξ,kxs+ξ−1(log x)kdx + Z ∞ 1 f (x)xs−1dx.

The first integral is analytic in h−γ, ∞i and the third one in h−∞, βi. Thus the sum of those two is analytic in the strip h−γ, βi. After integrating the second integral becomes

X

ξ,k

cξ,k

(−1)k_k!

(s + ξ)k+1.

Hence, f ∗(s) exists in h−γ, βi and has the singular expansion of the form (1.6).

Remark. From the proof of Theorem 2, we can see that there is a principle: Let g(x)

be a truncated asymptotic expansion of a given function f (x) at either 0 or ∞. Then the Mellin transform of f (x) − g(x) does not change, but only the fundamental strip

gets shifted. For example, M[ex _{− 1; s] = Γ(s) with the fundamental strip h−1, 0i, and}

M[ex_{− 1 + x; s] = Γ(s) with the fundamental strip h−2, −1i.}

(20)

Example 3. The function f (x) = (1 + x)−1 has fundament strip h0, 1i and its Mellin transform is f ∗(s) = Z ∞ 0 (1 + x)−1xs−1dx = Γ(1 − s)Γ(s) = π sin πs. Then the two expansions

1 1 + x = ∞ X n=0 (−1)nxn (x → 0), and 1 1 + x = ∞ X n=1 (−1)n−1x−n (x → +∞), translate into f ∗(s) ∞ X n=0 (−1)n s + n (s ∈ h−∞, 1i), and f ∗(s) ∞ X n=1 (−1)n−1 s − n (s ∈ h0, ∞i).

This is consistent with the known form,

f ∗(s) = π sin πs X n∈Z (−1)n s + n (s ∈ C). (1.8)

The next question that arises is whether or not a converse of the direct mapping theorem still holds. Under some conditions the answer is yes as the following theorem demonstrates:

Theorem 3. Let f (x) be continuous with its Mellin transform f ∗(s) having nonempty fundamental strip hα, βi.

(i) [Asymptotics for x → 0] Assume that f ∗(s) admits a meromorphic continuation to the strip hγ, βi for some γ < α with a finite number of poles there, and is analytic on <(s) = γ. Assume also that there exists a real number η ∈ (α, β) such that with r > 1,

f ∗(s) = O(|s|−r), (1.9)

when |s| → ∞ in γ ≤ <(s) ≤ η. If f ∗(s) admits the singular expansion for s ∈ hγ, αi, f ∗(s) X ξ,k dξ,k 1 (s − ξ)k+1, (1.10)

(21)

then an asymptotic expansion of f (x) at 0 is f (x) =X ξ,k dξ,k (−1)k k! x −ξ (log x)k+ O(x−γ). (1.11)

(ii) [Asymptotics for x → ∞] Similarly assume that f ∗(s) admits a meromorphic con-tinuation to the strip hα, γi for some γ > β and is analytic on <(s) = γ. Assume

also that the growth condition (1.9) holds in hη, γi for some η ∈ (α, β). If f ∗(s)

admits the singular expansion (1.10) for s ∈ hβ, γi, then an asymptotic expansion

of f (x) at ∞ is f (x) = −X ξ,k dξ,k (−1)k k! x −ξ_{(log x)}k_{+ O(x}−γ_). _(1.12)

Proof. As above it suffices to prove the case x → 0. Let Ω be the set of poles in hγ, βi, and set a large rectangle R(T ) with corners at the four points η ± iT , γ ± iT in the direction of counter-clockwise. Assume that T is large enough such that R(T ) contains all poles in Ω. Consider the integral

J (T ) = 1

2πi Z

R(T )

f ∗(s)x−sds,

we know J (T ) is equal to the sum of residues by Cauchy’s theorem, which is

J (T ) =X ξ,k dξ,kRes s=ξ x−s (s − ξ)k+1 =X ξ,k dξ,k (−1)k k! x −ξ (log x)k.

Now let T tend to +∞. By assumption J (T ) along the top and bottom lines of R(T ) is

bounded by O(T−r) which vanishes as T → ∞. On the left we have the bound of the

form 1 2πi Z γ+i∞ γ−i∞ f ∗(s)x−sds ≤ O(1) Z ∞ 0 x−γ (1 + t)rdt = O(x −γ ).

On the right the integral converges to f (x) by the inverse theorem (1.4) since f (x) is

continuous. This proves the claim.

From Theorem 2 and Theorem 3we know that the poles of f ∗(s) are in a one-to-one

(22)

Example 4. The function

f ∗(s) = Γ(1 − s) π

sin πs

is analytic in the strip h0, 1i. Note that π/ sin πs = O(e−π|=(s)|) as |s| → ∞, and a similar exponential decay holds for Γ(1 − s) by the complex version of Stirling’s formula:

Γ(σ + it) ∼√2π|t|σ−1/2e−π|t|/2 (t → ∞).

The singular expansion of π/ sin πs was already considered in (1.8). Thus for <(s) < 1, we have the singular expansion

f ∗(s) ∞ X n=0 (−1)n n! s + n. Then the asymptotic expansion of the original function is

f (x) ∼

∞

X

n=0

(−1)nn!xn (x → 0).

Sometimes f ∗(s) has a vertical line of regularly spaced poles. In this case, we need

the following weaker form of the growth condition (1.9).

Corollary 1. The conclusions of Theorem 3 remain valid assuming only a weaker form

of the growth condition (1.9) along a countable set of horizontal segments |=(s) = Tj|

where Tj → +∞.

Proof. Restrict T to belong to the discrete set Tj which must avoid the poles of f ∗(s) in

the proof of Theorem 3.

Applications. Mellin transform is effective in the asymptotic analysis of harmonic

sums.

Definition 3. 1. A harmonic sum F (x) is a sum of the form

F (x) =X

k

λkg(µkx), (1.13)

where λk are called “amplitudes”, µk are called “frequencies”, and g(x) is called the “base

function”.

2. The Dirichlet series of the harmonic sum is the sum

Λ(s) =X

k

(23)

Remark. A Dirichlet series (1.14) has a half-plane of absolute convergence hσa, ∞i and a

half-plane of simple convergence hσc, ∞i where σa− σc≥ 0.

Remark. The property of polynomial growth (1.2) in a closed strip holds for many Dirichlet

series.

From F3 in Table 1.2, we have

Mh X k∈K λkg(µkx); s i = X k∈K λkµ−sk · g∗(s)

where K is a finite set. This formula can be extended to the harmonic sums (infinite sums) as defined above:

Lemma 3. The Mellin transform of the harmonic sum (1.13) is defined in the intersection

of the fundamental strip of the transform of the base function and the domain of absolute convergence of Dirichlet series, and it is given by

F ∗(s) = Λ(s) · g∗(s). (1.15)

Proof. Since both g∗(s) and the Dirichlet series are analytic in the corresponding conver-gence regions, the interchange of summation and integration is valid by Fubini’s theorem.

To apply the converse mapping theorem for harmonic sums (1.13), we have to give

another definition of controlled growth (we have already introduced polynomial growth in Definition 1).

Definition 4. A function φ(s) is said to be of exponential decrease in a closed strip if for any r > 0,

φ(s) = O(|s|−r), (1.16)

as |s| → ∞ in the strip.

Now we suppose that the Mellin transform of the base function is of exponential decrease and the Dirichlet series of the harmonic sum is of polynomial growth in an extended region of the complex plane.

Theorem 4. Consider the harmonic sum F (x). Let the transform of the base function have the fundamental strip hα, βi, and the domain of simple convergence of Dirichlet series is hσc, ∞i. Assume that

(24)

(ii) g∗(s) and Λ(s) admit a meromorphic continuation in hγ, βi and are analytic on <(s) = γ, for some γ < α;

(iii) on the closed strip hγ, (α0 + β)/2i, g∗(s) is of exponential decrease and Λ(s) is of

polynomial growth.

Then F (x) converges for all x > 0 on (0, ∞). An asymptotic expansion of F (x) as x → 0

till an error term O(x−γ) is obtained by termwise translation of the singular expansion of

F ∗(s) = Λ(s)g∗(s) according to the rule C (s − ξ)k+1 7→ C (−1)k k! x −ξ (log x)k.

Proof. By Theorem3it suffices to show that the fundamental relation F ∗(s) = Λ(s)g∗(s).

First we select an arbitrary σ in (α0, β) and take σ0 such that α0 < σ0 < σ. Then the

inversion theorem provides

N X n=1 λng(µnx) = 1 2πi Z σ0+i∞ σ0−i∞ N X n=1 λn µs n g∗(s)x−sds.

Since |Λ(s)| ≤ C(|s| + 1) for some constant C (see [2]) we have

N X n=1 λn µs n g∗(s)x−s ≤ C(|s| + 1) · |g∗(s)| · x −<(s) = O(x−<(s)),

which permits to apply the dominated convergence theorem and we obtain

G(x) = 1

2πi

Z σ0+i∞

σ0−i∞

Λ(s)g∗(s)x−sds.

Thus, the strip hα0, βi is included in the fundamental strip of G(x). On the other hand,

since N X n=1 λng(µnx) ≤ 1 2π Z σ0+i∞ σ0−i∞ N X n=1 λn µs n g∗(s)x−s ds = O(x −<(s) ),

then the dominated convergence theorem applies once more

F ∗(s) = lim N →∞ Z ∞ 0 N X n=1 λng(µnx)xs−1dx = Λ(s) · g∗(s).

(25)

Remark. Similarly, a symmetric result holds near x → ∞. Thus under the condition of Theorem 4, X k λng(µnx) ∼ ± X p Res s=p g∗(s)Λ(s)x −s_,

As x → 0 the sum is over the poles to the left of the fundamental strip and the sign is +; and as x → ∞ the sum is over the poles to the right of the fundamental strip and the sign is −.

Example 5. The harmonic number Hn is

Hn = n X k=1 1 k = ∞ X k=1 h1 k − 1 k + n i . Thus the function

h(x) = ∞ X k=1 h1 k − 1 k + x i = ∞ X k=1 1 k x/k 1 + x/k

satisfies h(n) = Hn and is a harmonic sum with λk = µk = 1/k and g(x) = x/(1 + x). Its

Mellin transform is h∗(s) =Mhx( d dxlog (1 + x)); s i · ∞ X k=1 ks−1 = − π sin πsζ(1 − s),

with fundamental strip h−1, 0i. Note that for fixed σ < 0, one has ζ(σ + it) = O(|t|1/2−σ),

see [24, p. 95], and the exponential decay holds for π/ sin πs (see Example 4). The

singular expansion to the right of this fundamental strip is

h∗(s) 1 s2 − γ s − ∞ X k=1 (−1)kζ(1 − k) s − k .

Thus we have the expansion at ∞:

Hn= log n + γ + O(n−1).

1.3 Poissonization and De-poissonization

Poisson transform was introduced by Kac (1949). Sometimes a Poisson version of a

problem (called Poisson model) is easier to solve than the original one (called the Bernoulli model). The purpose of this section is to introduce the basics of this important method.

(26)

Table 1.3: Some Poisson transforms and their properties gn G(z)e Constant Constant (−1)n _e−2z αn _e(α−1)z n! (n−k)!, n ≥ k z k n! _1−ze−z gn= Pn k=0 n kp k_qn−k_(f k+ hn−k), p + q = 1 F (pz) + H(qz) gn=Pn_k=0 n_kpkqn−kfkhn−k, p + q = 1 F (pz)H(qz)

Poisson transform. Consider a sequence (gn), we define the Poisson transform (or

Poissonization) eG(z) as follows:

Definition 5. Let (gn) be a sequence. Then the Poisson transform eG(z) of (gn) is defined

as e G(z) =X n≥0 e−zgn zn n! (1.17)

for arbitrary complex z.

Some Poisson transforms and their properties are presented in Table 1.3. Next, we

give an example that is important in applications. Example 6. Consider the recurrence

gn = an+ β n X k=0 n k pkqn−k(gk+ gn−k), n > 1

with initial value g0. Then, we find

e

G(z) = eA(z) + β eG(pz) + eG(qz) − g0e−z,

where eG and eA are the Poisson transforms of gn and an, respectively.

General de-poissonization theorems. Now we consider a sequence (gn) and its

(27)

can extract the coefficient gn = n![zn]( eG(z)ez) directly. Our aim is to extract

asymptoti-cally gn from eG(z). Our starting point for this will be Cauchy’s formula:

gn= n! 2πi I e G(z)ez zn+1 dz = n! nn_2π Z π −π e

G(neit) exp(neit)e−nitdt. (1.18)

Next, we give the definition of a linear cone: Definition 6. The region in the complex plane

Lθ = {z : | arg z| ≤ θ},

where |θ| < π/2 is called a linear cone.

Moreover, we need the following two lemmas. The first one is well-known, and the second one is a simple extension of the Cauchy estimate.

Lemma 4. The following identities are true: 1 √ 2π Z ∞ −∞ xke−αx2dx = ( 0, k = 1, 3, 5, . . . α−1/2−k/2k! (k/2)!2k+1/2, k = 0, 2, 4, . . . and Z ∞ θ xke−αx2dx = Oe−(1/2)αθ2 where θ is a positive number.

Lemma 5. Let θ0 < π/2 and ξ > 0. Moreover, let Ψ(z) be a slowly varying function

(that is, for fixed t, limx→∞ Ψ(tx)/Ψ(x) = 1) and assume that

|z| > ξ ⇒ |G(z)| ≤ B|z|βΨ(|z|) (1.19)

for all z ∈ Lθ0, where β is a real constant. Then, for all θ < θ0 there exist B

0 _{and ξ}0 _{> ξ}

such that for all positive integers k the following holds in Lθ

|z| > ξ0 _{⇒ |G}hki_{(z)| ≤ k!(B}0₎k_|z|β−k_Ψ(|z|). _(1.20)

Proof. See [11].

Now, we first give a basic de-poissonization result that holds for eG(z) with a polynomial bound in a linear cone:

(28)

Theorem 5. Let eG(z) be the Poisson transform of a sequence (gn) that is assumed to be

entire. Suppose that in a linear cone Lθ (θ < π/2) both of the following two conditions

hold for some real numbers A, B, R > 0, β and α < 1:

(I) For z ∈ Lθ |z| > R ⇒ | eG(z)| ≤ B|z|β; (O) For z /∈ Lθ |z| > R ⇒ | eG(z)ez| ≤ Aeα|z|_. Then gn = eG(n) + O(nβ−1) for large n.

Proof. The proof relies on the equation (1.18). By Stirling’s approximation n! = nn_e−n√_{2πn 1+}

O(n−1_{), we have} gn= 1 + O(n−1) r n 2π Z π −π e

G(neit) exp(n(eit− 1 − it)) dt = 1 + O(n−1)(In+ En), where En = r n 2π Z |t|∈ [θ,π] e G(neit) exp n(eit− 1 − it)dt =n n_e−n√_2πn 2πi Z |t|∈ [θ,π] e G(z)ez zn+1 dt, In = r n 2π Z θ −θ e G(neit) exp n(eit− 1 − it)dt.

By condition (O) we obtain that En decays exponentially to zero for α < 1. Now, we

turn to In. First we replace t by t/

√

n and let hn(t) = exp

n(eit/√n_{− 1 − it/}√_n)_{. Next,}

we split In into two parts, In0 and I 00

n (in order to find the Taylor expansion of hn(t)) such

that I_n0 =√1 2π Z log n − log n e Gneit/ √ n_h n(t) dt, I_n00 =√1 2π Z t∈[−θ√n,− log n] e G neit/ √ n hn(t) dt +√1 2π Z t∈[log n,θ√n] e Gneit/ √ n_h n(t) dt.

(29)

Observe that |hn(t)| ≤ e−µt

2

for t ∈ [−θ√n, θ√n], where µ is a constant. Then by

condition (I) and Lemma 4 we obtain I_n00 = O nβe−µ log2n. Now, we estimate I_n0. For t ∈ [− log n, log n] we have the Taylor expansion of hn(t)

hn(t) = e−t 2_/2 1 − it 3 6√n + t4 24n + O log5n n√n .

Using condition (I) and Lemma 5 for |z| > Cξ with constant C and z ∈ Lθ0 for θ0 < θ,

we have | eG0(z)| ≤ C1|z|β−1 and | eG00(z)| ≤ C2|z|β−2 for some constants C1 and C2. Thus

we can expand eGneit/√n _{around t = 0 as}

e G neit/ √ n = eG(n) + it√n eG0(n) + 4n(t)t2,

where |4n(t)| ≤ (C1+ C2)nβ−1. Finally, the integral In0 becomes

I_n0 =√1 2π Z log n − log n e−t2G(n) + ite √ n eG0(n)1 − it 3 6√n + t4 24n dt + √1 2π Z log n − log n e−t24n(t)t2hn(t) dt + √1 2π Z log n − log n e−t2G(n) + ite √ n eG0(n)Olog 5 n n√n dt.

From Lemma4and Lemma5the first integral is equal to eG(n) + O(nβ−1_{). The absolute}

value of second integral is smaller than (C1 + C2)nβ−1 by using the above estimate on

4n(t). Finally the third integral is O(nβ−3/2log5n). Thus we have In0 = eG(n) + O(nβ−1)

as desired.

The next theorem extends the above one to a full asymptotic expansion of gn:

Theorem 6. Consider a linear cone Lθ (θ < π/2). Let the following two conditions hold

for some numbers A, B, R > 0, and α > 0, β, and γ: (I) For z ∈ Lθ,

|z| > R ⇒ | eG(z)| ≤ B|z|βΨ |z|, where Ψ(x) is a slowly varying function.;

(O) For all z = ρeiθ _{with θ ≤ π such that z /}_{∈ L}

θ,

(30)

Then, for every non-negative integer m, gn = m X i=0 i+m X j=0 bi,jniGehji(n) + O nβ−(m+1)Ψ(n) = eG(n) + m X k=1 k X i=1 bi,k+iniGehk+ji(n) + O nβ−(m+1)Ψ(n), (1.21)

where bi,j = [xi][yj] exp x log(1 + y) − xy. Note that bi,j = 0 for j < 2i.

Proof. The proof can be found in [11].

Remark. We present the expansion (1.21) above for m = 3:

gn= eG(n) − 1 2n eG h2i (n) +1 3n eG h3i (n) + 1 8n 2 e Gh4i(n) −1 4n eG h4i (n) − 1 6n 2 e Gh5i(n) − 1 48n 3 e Gh6i(n)+ O(nβ−4Ψ(n)).

Mean and variance. Let (Xn) be a sequence of integer random variables, and denote

by Fn(y) = E[yXn] the probability generating function. Let

e L(z, y) = ∞ X n=0 Fn(y) zn n!e −z

be the Poisson transform of the probability generating function. We introduce the Poisson

mean eX(z) and the Poisson variance eV (z) as

e

X(z) =eLy(z, 1),

e

V (z) =eLyy(z, 1) + eX(z) − eX(z)2,

where eLy(z, 1) and eLyy(z, 1) denote respectively the first and the second derivative of

e

L(z, u) with respect to y at y = 1.

There is the following relationship between the Poisson mean eX(z) and variance eV (z)

of Xn, and the Bernoulli mean E[Xn] and variance V[Xn].

Theorem 7. Let eX(z) and eV (z) + eX(z)2 _{satisfy condition (O), and e}_{X(z) and e}_{V (z)}

satisfy condition (I) of Theorem 6 with β ≤ 1, e.g., eX(z) = O |z|βΨ(|z|), and V (z) =e

O |z|β_{Ψ(|z|) in a linear cone L}

θ and appropriate conditions (O) outside the cone, where

Ψ(z) is a slowly varying function. Then, the following holds E[Xn] = eX(n) − n 2Xe h2i (n) + Onβ−2Ψ(n), (1.22) V[Xn] = eV (n) − n eX0(n)2+ O maxnβ−1Ψ(n); n2β−2Ψ2(n), (1.23) for large n.

(31)

Proof. From Theorem6, we have directly (1.22_{) for m = 1. Since V[X}n] = E[Xn2]−E[Xn]2,

we observe that the Poisson transform of E[X2

n] is eV (z) + eX(z)2. Thus by Theorem 6 again E[Xn2] = eV (n) + eX(n)2− n 2 e Vh2i(n) + 2n eX0(n)2 + 2n eX(n) eXh2i(n)+ O n2β−2Ψ2(n) = eV (n) + eX(n)2− n eX0(n)2− n eX(n) eXh2i(n) + O nβ−1Ψ(n) + O n2β−2Ψ2(n),

where the last error term is a consequence of n eVh2i(n) = O(nβ−1_{Ψ(n)) (see Lemma} ₅_).

Thus the result follows from V[Xn] = E[Xn2] − [EXn]2.

1.4 Singularity Analysis

In this section, we restrict our attention to functions with a unique dominant singularity. By the scaling rule g(z) = f (zξ) if f (z) has singular at z = ξ, we may always assume that the sole singularity occurs at z = 1, and we consider functions f (z) of the form

f (z) = (1 − z)−αlog 1

1 − z γ

, (1.24)

with non-negative real numbers α and γ. Our general objective is to translate an approxi-mation of a function near a singularity into an asymptotic approxiapproxi-mation of its coefficients. More precisely, when all h0(z), · · · , hk(z), g(z) are as (1.24), then

f (z) = h0(z) + h1(z) + · · · + hk(z) + O g(z)

(1.25) with h0(z) · · · hk(z) g(z) for z → 1, will imply

[zn]f (z) = h0,n+ h1,n+ · · · + hk,n+ O gn

with h0,n · · · hk,n gn for n → ∞. We omit all the proofs in this section since they

can be found in [3].

From the binomial expansion, we have, with α 6= 0, [zn](1 − z)−α =n + α − 1

n

= Γ(n + α)

Γ(α)Γ(n + 1).

Then from Stirling’s formula [zn](1 − z)−α has the asymptotic expansion, as n → ∞,

[zn](1 − z)−α ∼ n α−1 Γ(α) 1 +X k≥1 ek nk , (1.26)

(32)

Table 1.4: Some commonly functions and the asymptotic forms of their coefficients. f (z) [zn_{]f (z)} 1 0 log(1 − z)−1 1_n (1 − z)−1 1 (1 − z)−1log 1 1−z log n + γ + 1 2n − 1 12n2 + _120n1 4 + O(n6)

(1 − z)−1 log _1−z1 2 log2n + 2γ log n + γ2 −π2

6 + O log n n (1 − z)−2 n + 1 Remark. In particular: [zn](1 − z)−α ∼ n α−1 Γ(α) 1 + α(α − 1) 2n + α(α − 1)(α − 2)(3α − 1) 24n2 +α 2_{(α − 1)}2_{(α − 2)(α − 3)} 48n3 + O 1 n4 .

Next, we consider logarithmic factors, that is, f (z) = (1 − z)−α log (1 − z)−1γ with

α 6= 0. Similarly, we have the asymptotic expansion [zn]f (z) ∼ n α−1 Γ(α)(log n) γ 1 +X k≥1 Ck logkn , where Ck = γ_kΓ(α) d k dsk 1 Γ(−s) s=α.

Next, we want to establish our claim in (1.25). Therefore, we have to give conditions

under which the following holds:

f (z) = O(g(z)) ⇒ [zn]f (z) = O([zn]g(z)).

We first need a definition.

Definition 7. Let 4 := 4(φ, η) denote the closed domain

4(φ, η) = {z |z| < η, z 6= 1, | arg(z − 1)| ≥ φ}, where η > 1 and 0 < φ < π/2.

(33)

Theorem 8. Assume that f (z) is analytic in 4 = 4(φ, η), where η > 1 and 0 < φ < π/2, and that as z → 1 in 4, f (z) = O(1 − z)−αlog 1 1 − z γ , for some non-negative integers α, γ with α 6= 0. Then one has

[zn]f (z) = O

nα−1(log n)γ

. Finally, by the linearity

f (z) = f1(z) + f2(z) ⇒ [zn]f (z) = [zn]f1(z) + [zn]f2(z).

We have the following theorem:

Theorem 9. Assume that f (z) is analytic in 4 = 4(φ, η), where η > 1 and 0 < φ < π/2, and that as z → 1 in 4, f (z) = (1 − z)−αlog 1 1 − z γm−1_X j=0 cj log 1 1 − z −j + Olog 1 1 − z −m ,

for non-negative real numbers α, γ with α 6= 0 and γ ≥ m. Then as n → ∞,

[zn]f (z) = n α−1 Γ(α)log γ n m−1 X j=0 c0_jlog−jn + O(log−mn)

(34)

Chapter 2 Results for Digital Search Trees

In this chapter, we first introduce digital search trees and their generalizations such as bucket digital search trees. Next, we present the results concerning the internal path length and explain how the results are proved. We also present results concerning other parameters of DSTs in Section 2.5.

2.1 Digital Search Trees

Digital trees are a general data structure to manipulate sequences which are built over a binary alphabet {0, 1}. There are three kinds of digital trees: “tries”, “Patricia tries” and “digital search trees”. In this thesis we only consider digital search trees and omit the others.

Suppose now we have an ordered set of records, say n of them, and each record has a key being an infinite sequence over {0, 1}. Then these records are stored in a digital search tree in the following way: Set k to 1. If n = 1, then the only record is put in a node and we are finished. If n > 1, then

• The first record is saved in a node (which becomes the root of the tree). • According to the kth bit of the records in the remaining set:

0: It goes to the left subtree where it is linked as a left child of the root. 1: It goes to the right subtree where it becomes a right child of the root. We can split the remaining set into two subtrees.

• Finally, the subtrees are constructed by the same process recursively and set k to k + 1.

(35)

Figure 2.1: Examples of generalized digital search trees for b = 1, 2, 3 built from 12 records.

Thus we can see that digital search trees are build up of nodes, each node has a record containing a key and 2 links which point to subtrees. Obviously, the order in which the keys are inserted is relevant.

Next we equip the set of all digital search trees with a random model. Therefore we assume that each bit {0, 1} is generated independently with probability p and q = 1 − p. For p 6= q this leads to the asymmetric (biased) DST, where if p = q = 1/2, we obtain the symmetric (unbiased) DST.

Many generalizations of digital search trees have been considered. One of them are so called bucket digital search trees, where every node can hold up to b records.

The internal path length of a tree is the sum of the lengths of the paths to every node. More precisely, it is the sum of the number of edges on the path from the root to

each node. In this work we denote by Ln the internal path length of a DST built from n

(sufficiently long) records comprised of random digits.

Digital search trees have been quite thoroughly investigated in recent decades. Knuth

[15] and Flajolet and Sedgewick [5] introduced analytical methods for the analysis of

(36)

[17], Szpankowski [22], Jacquet [10], Kirschenhofer and Prodinger [12] and others.

2.2 Internal Path Length for Symmetric DSTs

Now we are discussing the internal path length of a symmetric DST. Let π(n, k) be the splitting probability which is the probability that the left subtree holds k records (and the

right subtree holds n − 1 − k records). Clearly π(n + 1, k) = n_k/2k. Under the condition

of {π(n + 1, k)} we have the recurrence Ln+1

d

= Lk+ Ln−k + n, which implies that the

corresponding probability generating functions Fn(z) = E[zLn] satisfy for n ≥ 0

Fn+1(z) = zn2−n n X k=0 n k Fk(z)Fn−k(z), F0(z) = 1. (2.1)

Mean. Knuth [15] first used an approach suggested by Koheim and Newman [16] to

derive the mean, but his approach is not useful for the analysis of other parameters.

Flajolet and Sedgewick [5] gave another approach to analyze the mean which we will

discuss here.

The expectation fn = E[Ln] can be obtained from the probability generating functions

(2.1) by fn= Fn0(1). Consequently, fn+1 = n + 21−n n X k=0 n k fk (n > 0), f0 = 0.

The above recurrence falls into the general type discussed in the following lemma:

Lemma 6. Let (xn) be a sequence of numbers satisfying x0 = x1 = 0,

xn+1 = an+1+ 21−n n X k=0 n k xk (n > 1),

where (an) is any sequence of numbers with a0 = a1 = 0;. We define the binomial inverse

relations ˆ an = n X k=0 (−1)kn k ak and an= n X k=0 (−1)kn k ˆ ak. (2.2)

Then the solution is given by

xn = − n X k=2 (−1)kn k ˆ xk−2,

(37)

where Qn= Q 1≤j≤n(1 − 2 −j_{) and} ˆ xn= Qn n+1 X i=1 ˆ ai− ˆai+1 Qi−1 . Proof. See [14].

Thus we obtain an explicit formula for fn:

fn = n X j=2 (−1)kn k Qk−2.

This is exactly Example2discussed in Section 1.1. Thus, we have the following theorem.

Theorem 10 (Flajolet and Sedgewick). The average internal path length of a symmetric digital search tree built from n records is

E[Ln] =n log2n + n γ − 1 log 2 + 1 2− α + δ1(log2n) + log₂n +2γ − 1 2 log 2 + 5 2 − α + δ2(log2n) + O(log n/n),

where γ = 0.577216 · · · is Euler’s constant, α = 1 +1₃ +1₇ + · · · = 1.606695 · · · , and δ1(x)

and δ2(x) are continuous periodic functions of period 1, mean 0, and very small amplitude

(< 10−6). The approximate value of the coefficient of the linear term is −1.7155 · · · . Proof. Collecting all contributions as in Section 1.1. gives the expansion. The pole at z = 0 yield a contribution of log₂n + _{log 2}γ +5₂ − α, and the poles z = 2kπi

log 2 yield a periodic

contribution of order n0 _{and so on.}

Variance. By applying the same technique, Kirschenhofer, Prodinger and Szpankowski

[14] derived the variance of the internal path length. More precisely, they used that the

variance satisfies V[Ln] = sn+ fn− fn2 with sn = Fn00(1). From (2.1) we get the following

recurrence for n ≥ 0, sn+1 =n22−n n X k=0 n k fk+ n(n − 1) + 21−n n X k=0 n k fkfn−k + 21−n n X k=0 n k sk

(38)

and s0 = 0. We split it into three parts. Let sn = un+ vn+ wn, where un+1 =2n(fn+1− n) + 21−n n X k=0 n k uk (n > 0), u0 = 0, (2.3a) vn+1 =n(n − 1) + 21−n n X k=0 n k vk (n > 0), v0 = 0, (2.3b) wn+1 =21−n n X k=0 n k fkfn−k + 21−n n X k=0 n k wk (n > 0), w0 = 0, (2.3c)

All of the above three recurrences are of the type as discussed in Lemma 6. Thus, the

solutions of (2.3a)–(2.3c) follow from the binomial relations (2.2), where

ˆ uk =2Qk−2 4 + k−2 X j=1 1 2j _{− 1}− k−2 X j=1 j 2j _{− 1}− 2k 2k−2_{− 1} (k > 3), uˆ0 = ˆu1 = ˆu2 = 0; (2.4a) ˆ vk = − 4Qk−2 (k > 3), vˆ0 = ˆv1 = ˆv2 = 0; (2.4b) ˆ wk = − Qk−2 k−1 X j=4 21−j Qj−1 j−2 X i=2 j i Qi−2Qj−i−2 (k > 5), wˆ0 = · · · = ˆw4 = 0. (2.4c)

Next we focus on the asymptotics of un. In order to find an appropriate analytic

continuation of ˆuk, we can rewriting the sums appearing in (2.4a) as follows: k−2 X j=1 1 2j_{− 1} =α − X j≥1 1 2k−2+j _{− 1}; k−2 X j=1 j 2j_{− 1} = X j≥1 j 2j_{− 1} − X j≥1 k − 2 + j 2k−2+j _{− 1},

where α is as defined in Theorem 10. Thus we may continue ˆuk via the function

ˆ u(z) = 2Q∞ Q(22−z₎ 4 + α −X j≥1 1 2z−2+j_{− 1} − X j≥1 j 2j _{− 1} +X j≥1 z − 2 + j 2z−2+j_{− 1} − 2z 2z−2_{− 1} ,

where Q∞ = 0.28878809 and Q(z) =Q_j≥1(1 − t/2j). Now, we can apply the Rice method

(39)

Next, the recurrence for vn is easier. After simple algebra one proves vn= 4 n 2 − 4fn,

and it is easy to get the asymptotics of vn.

The appropriate extension of ˆwn is intricate. From (2.4c) we have

ˆ wk+1 = −Qk−1 k X j=4 ξ(j + 1) 2j−1_Q j−1 with ξ(j + 1) = j−2 X i=2 j i Qi−2Qj−2−i.

Since ξ(j + 1) ∼ 2jQ2_∞, let η(j + 1) = ξ(j + 1) − 2jQ2_∞. Then

ˆ wk+1= −Qk−1 k X j=4 η(j + 1) + 2j_Q2 ∞ 2j−1_Q j−1 = Qk−1 − 2Q∞(k − 3) − X j≥3 η(j + 2) 2j_Q j +X j≥0 η(k + j + 2) 2k+j_Q k+j + 2Q2_∞ X j≥0 1 Qk+j − 1 Q∞ −X j≥3 1 Qj − 1 Q∞ .

All series are absolutely convergent, we may sum them up term-by-term and get ˆ wk+1 = Qk−1 − 2Q∞k + ξ(k + 2) 2k_Q k + ξ(k + 3) 2k+1_Q k+1 +X j≥2 ξ(k + j + 2) 2k+j_Q k+j −ξ(j + 2) 2j_Q j .

From an appropriate interpretation for ξ(z + 1) (see [14])

ξ(z + 1) =X r≥0 (−1)r₂−₍r+1₂ ₎ Qr · Q∞ Q(23−z−r₎· 2z− 2 1 − 21−z−r − 2z 1 − 22−z−r + 2 X k≥2 z k 1 2r+k−1_{− 1} ,

we immediately obtain the representation for ˆw(z):

ˆ w(z + 1) = Qz−1 − 2Q∞z + ξ(z + 2) 2z_Q z + ξ(z + 3) 2z+1_Q z+1 +X j≥2 ξ(z + j + 2) 2z+j_Q z+j − ξ(j + 2) 2j_Q j

with Qz = Q∞/Q(2−z), where Q(z), Q∞ are defined as above. Then, we again can obtain

the asymptotics of wn by Rice method.

(40)

Theorem 11 (Kirschenhofer, Prodinger and Szpankowski). The variance of the internal path length of symmetric digital search trees built from n records is

V[Ln] = n ·

C + δ(log₂n)

+ O(log2n/n),

where C is a constant with C = 0.2660 . . . and all four digits after the decimal point are significant. The explicit form of C is

C = − 28 3L− 39 4 − 2 X n≥1 n2n (2n_{− 1)}2 + 2α L + π2 2L2 + 2 L2 − 2 L X k≥3 (−1)k+1_{(k − 5)} (k + 1)k(k − 1)(2k_{− 1)} + 2 L X r≥1 (−1)r2−(r+12 ) L(1 − 2 −r+1_{)/2 − 1} 1 − 2−r − X k≥2 (−1)k+1 k(k − 1)(2r+k_{− 1)} + 2 Lwˆ 0 (3) − 2δ0− δ1 (2.5)

with L = log 2, the fluctuating function δ(x) is a continuous with period 1, mean zero, and |δ(x)| ≤ 10−6, δ0, δ1 are two non-zero numbers with |δ0| ≤ 10−10 and |δ1| ≤ 10−10,

and ˆw(z) is defined above.

2.3 Internal Path Length for Asymmetric DSTs

From the last section, we know that Kirschenhofer et al. [14] obtained an asymptotic

expression for the variance of the internal path length in the symmetric DST model. However, they did not extend their results to the asymmetric model. Jacquet and Sz-pankowski devised another approach to give the mean and variance of the internal path

length of the asymmetric model in a DST [10]. We will introduce this method in this

subsection.

Therefore, we suppose the binary digital search tree model is asymmetric with the probabilities p, q (p + q = 1). Similar as in the last section, we have P(π(n + 1) = k) =

n kp

k_qm−k_{and the probability generating functions F}

n(y) = E[yLn] of Lnsatisfy for n ≥ 0,

Fn+1(y) = zn n X k=0 n k

pkqn−kFk(y)Fn−k(y), F0(y) = 1.

Now, define L(z, y) =P

n≥0Fn(y)z

n_{/n!. Then one has}

∂

∂zL(z, y) = L(pzy, y)L(qzy, y), L(z, 0) = 1.

Finally, we consider the Poisson generating function eL(z, y) = L(z, y)e−z and obtain

e

L(z, y) + ∂

∂zL(z, y) = ee

(y−1)z

e

(41)

Next, denote by eX(z) = eLy(z, 1) and eV (z) = eLyy(z, 1) + eX(z) − eX(z)2 the Poisson mean

and Poisson variance as already defined in Section1.3.

Poisson model. Consider first eX(z). From (2.6) we obtain the following recurrence

e

X(z) + eX0(z) = eX(pz) + eX(qz) + z, X(0) = 0.e (2.7)

Let X∗(s) denote the Mellin transform of eX(z). Note that eX(z) = O(z2_{) as z → 0 and}

e

X(z) = O(|z| log |z|) as z → ∞ in a linear cone (see the appendix in [10]). Thus the

fundamental strip of X∗(s) is h−2, −1i and the Mellin transform of eX0(z) − z is also

defined in the same strip. Then (2.7) translates into

X∗(s) − (s − 1)X∗(s − 1) = (p−s+ q−s)X∗(s) (2.8)

in terms of the Mellin transform. Next, we set X∗(s) = ξ(s)Γ(s) where Γ(s) is the gamma function, and ξ(s) satisfies the following recurrence:

ξ(s) − ξ(s − 1) = (p−s+ q−s)ξ(s). After some algebra one obtains

ξ(s) = ∞ Y k=0 1 − pk+2_{− q}k+2 1 − p−s+k _{− q}−s+k = Q(−2) Q(s)

for s ∈ h−2, −1i, where Q(s) =Q

k≥0(1 − p

−s+k _{− q}−s+k_{). We need a lemma to find the}

singularities:

Lemma 7. Let sk for k ∈ Z be solutions of

p−s+r+ q−s+r = 1, where p + q = 1 and s is complex.

(i) For all k ∈ Z

−1 + r ≤ <(sk) ≤ σ0+ r,

where σ0 is a positive solution of 1 + q−s= p−s. Furthermore,

(2k − 1)π

log p ≤ =(sk) ≤

(2k + 1)π

(42)

(ii) If <(sk) = −1 + r and =(sk) 6= 0, then log p/ log q must be rational. More precisely,

if log p_{log q} = w_t_{, where gcd(w, t) = 1 for w, t ∈ Z, then}

−1 + r + 2mwπi

log p , m ∈ Z,

are all zeros with <(sk) = −1 + r.

Inverting the Mellin transform then yields the the following asymptotic expansion of the Poisson mean:

e X(z) = z h log z + γ − 1 + h2 2h − α − δ1(log z) + o(z) (z → ∞), (2.9)

where h = −p log p − q log q is the entropy of the alphabet, γ = 0.577 . . . is the Euler constant, h2 = −p log2p − q log2q,

α = − ∞ X k=1 pk+1_{log p + q}k+1_{log q} 1 − pk+1_{− q}k+1 , (2.10)

and δ1(log z) is a fluctuating function for log p/ log q rational with small amplitude, and

zero otherwise.

The variance is more intricate. Let fW (z) = eV (z) − eX(z). From (2.6) we observe that f

W (z) satisfies the recurrence f

W (z) + fW0(z) = fW (pz) + fW (qz) + 2pz eX0(pz) + 2qz eX0(qz) + eX0(z)2, fW (0) = 0. This functional equation is harder to solve due to the last term for which there is no closed-form expression for the Mellin transform, but it can be proved that the last term only contributes O(z). Let fW (z) = fW1(z) + fW2(z) where

f

W1(z) + fW10(z) =fW1(pz) + fW1(qz) + 2pz eX0(pz) + 2qz eX0(qz), Wf₁(0) = 0, f

W2(z) + fW20(z) =fW2(pz) + fW2(qz) + eX0(z)2, Wf₂(0) = 0.

Then, it was shown that in [10] that fW2(z) satisfies fW2(z) = O(z) for z tends to infinity.

Note that fW1(z) = O(z3) as z → 0 and fW1(z) = O(|z| log |z|) as z → ∞ in a linear cone.

Hence the fundamental strip of W ∗₁ (s) is h−3, −1i and the Mellin transform of fW₁0(z) is

defined in h−2, 0i. For s ∈ h−2, −1i, the Mellin transform W ∗₁ (s) becomes

W ∗₁ (s) + g∗(s) = (p−s+ q−s)W ∗₁ (s) − 2(p−s+ q−s)sX∗(s), where g∗(s) = M[fW₁0(z); s]. Solving it, we obtain

W ∗₁ (s) = −g∗(s)

1 − p−s_{− q}−s −

2(p−s+ q−s)sX∗(s)

(43)

Since g∗(s) is analytic on h−2, 0i, g∗(s)/(1 − p−s − q−s_{) only contributes terms up to}

O(z). Next, we can manipulate fW1(z) similar as the Poisson mean and get the asymptotic

expansion of the Poisson variance

e V (z) = z log 2_z h2 + 2z log z h3 γh + h2− h2 2 − αh − hδ1(log z) − hδ 0 1(log z) + O(z). (2.11)

Bernoulli model. From the two asymptotic expansions (2.9) and (2.11), we can observe

that they satisfy the condition (I) of Theorem 7. To verify condition (O), we consider

Y (z) = eX(z)ez _{and get}

Y0(z) = Y (pz)eqz + Y (qz)epz+ zez, Y (0) = 0. Observe that the above equation can be represented as

Y (z) = Z z 0 Y (pw)eqw+ Y (qw)epw+ wew dw.

We can apply mathematical induction over increasing domains and get a bound for Y (z) = e

X(z)ez _{(see [}₁₁_{] for more details), as needed to verify condition (O) of Theorem} ₇_{. In a}

similar manner we can handle eV (z) + eX(z)2. Thus we have the following theorem of the

mean and the variance of the internal path length (see [10]):

Theorem 12 (Jacquet and Szpankowski). Consider a digital search tree built from n records under the asymmetric DST Bernoulli model. Then asymptotically the average

value E[Ln] and the variance V[Ln] of the internal path length of the digital search tree

become E[Ln] = n h log n + h2 2h + γ − 1 − α + δ0(log n) + o(n), V[Ln] ∼c2n log n, (2.12)

where h = −p log p − q log q is the entropy of the alphabet, γ = 0.577 . . . is the Euler constant, h2 = p log2p + q log2q, and c2 = (h2 − h2)/h3, α is defined in (2.10) and

δ0(log n) is a fluctuating function for log p/ log q rational with small amplitude, and zero

otherwise.

2.4 B-DSTs

Now we consider a b-DST, which is similar to the DST but now up to b records are stored in the nodes (the bucket capacity is b). The random model is as before. Flajolet and

(44)

Richmond [4] devised a method to give the average size of a digital search tree under the

symmetric model. Hubalek [8] further developed the approach by Flajolet and Richmond

to give the mean and variance of the internal path length of a symmetric b-DST.

From now on we fix the capacity b as an integer, and consider a b-DST built from

n records (n ≥ 0). Let Ln be the internal path length of a symmetric b-DST built

from n records. Since we know that the first b records are stored in the root, thus the

corresponding probability generating functions Fn(z) = E[zLn] satisfy for n ≥ 0

Fn+b(z) = zn n X k=0 2−nn k Fk(z)Fn−k(z), F0(z) = · · · = Fb−1(z) = 1.

Mean. As before, the expectation is fn= E[Ln] = Fn0(1). Hence,

fn+b = n + 21−n n X k=0 n k fk, f0 = f1 = · · · = fb−1(z) = 0. (2.13)

Again similar as before, we first investigate the general recurrence: xn+b= an+b+ 21−n n X k=0 n k xk, x0 = a0, x1 = a1, . . . , xb−1 = ab−1.

One of the innovations in [4] is to consider the ordinary generating function. If we set the

ordinary generating function X(z) =P

n≥0xnz

n _{and A(z) =} P

n≥0anz

n _{with respect to}

the sequences (xn) and (an), we derive the following lemma.

Lemma 8. The generating function X(z) is given by X(z) = _1−z1 X(e _1−zz ), where eX(z)

satisfies

(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(e z

2) (2.14)

and eA(z) = _1+z1 A(_1+zz ).

Proof. Consider the Poisson transform x(z) and_e _ea(z) of the sequences (xn) and (an),

respectively. Then, we obtain for the coefficients _exn= n![zn]x(z) ande ean = n![z

n_] ea(z) b X j=0 b j e xn+j = b X j=0 b j e an+j + 21−nexn, ex0 =xe1 = · · · = exb−1= 0. (2.15)

From the equivalent relations (similar to the sequence (an) and (ean))

xn= n X k=0 n k e xk ⇐⇒ xen = n X k=0 n k (−1)n−kxk,

(45)

we have F (z) = 1 1 − zFe z 1 − z and A(z) =e 1 1 + zA z 1 + z , (2.16) where eX(z) =P n≥0exnz n _{and e}_{A(z) =} P n≥0eanz

n_{. Finally, multiplying z}n+b _{to (}_2.15_{) and}

summing over n we obtain the relation

(1 + z)bX(z) = (1 + z)e bA(z) + 2ze bX(z/2).e

Remark 1. In Lemma 8, let ˆX(t) = eX(t−1), ˆA(t) = eA(t−1) and

φ(t) =Y j≥0 (1 + 2−jt). (2.17) Then, by iterating: ˆ X(t) = ˆA(t) + 2 (1 + t)bX(2t) =ˆ X j≥0 2j_A(2_ˆ j_t) (1 + t)b_{· · · (1 + 2}j−1_t)b =φ t 2 b_X j≥0 2j(1 + 2 j_t)b_A(2_ˆ j_t) φ(2j_t)b . (2.18)

Thus, we obtain the harmonic sum Φ(t) = P

j≥02jP (2ˆ jt)/φ(2jt)b, where ˆP (t) = (1 +

t)b_{A(t). Since φ(}ˆ t 2)

b _{= 1 + bt + O(t}2_{) (the Taylor expansion at 0), it suffices to know the}

asymptotic behavior of Φ(t) whose Mellin transform is given by

Φ∗(s) = 1 1 − 21−s · ˆP (t) φ(t)b ∗ (s). (2.19)

Now, we will turn to the mean. From (2.13) and Lemma 8:

(1 + z)bF (z) = ze b+1+ 2zbF (z/2).e

Using Remark1 one has

ˆ F (t) =φ t 2 b1 t X j≥0 2j 2j_φ(2j_t)b =φt 2 b H(t). (2.20)

(46)

From the integral relationR₀∞log(1 + z)zs−1_{dz =} π

s sin πs for <(s) ∈ h−1, 0i, we have

log φ(t) =X j≥0 log(1 + 2−jt) = 1 2πi Z 1/2+i∞ 1/2−i∞ π (1 − 2s_{)s sin πs}t −s ds ∼ log 2 t 2 log 2 + log t 2 , (2.21)

uniformly for |t| → ∞ in the linear cone Lθ for any fixed θ ∈ (0, π). Thus,

φ(t)−b =

(

1 − 2bt + O(t2_), _{t → 0,}

O exp(−(b/2 log 2) log2_{t), t → ∞,} (2.22)

in the cone. This guarantees the existence of the Mellin transform of H(t) which is

H∗(s) = 1 1 − 21−sI∗(s − 1) (<(s) > 1), (2.23) where I∗(s) = Z ∞ 0 φ(t)−bts−1dt (2.24)

converges in the strip h0, ∞i.

Remark. I∗(s) is exponentially small as =(s) → ±∞ for <(s) > 0 [4]. Moreover, one can

prove I∗(s) = π sin πsJ (s), with J (s) = 1 2πi Z H 1 φ(t)b(−t) s−1_dt, _(2.25)

where H is a Hankel-type contour starting at +∞−0·i, turning around 0 clockwise before

returning to +∞ + 0 · i. Flajolet and Richmond [4] also give the representation

J (s) = A0(2s) + (s − 1)A1(2s) + · · · + (s − 1)(s − 2) · · · (s − b + 1)Ab−1(2s), (2.26)

where Ak(x)’s are entire functions, thus J (k) = 0, for all k ≥ 1. Furthermore, (2.22)

implies that I∗(s) ∼ s−1 as s → 0 and I∗(s) ∼ −2b(s + 1)−1 as s → −1. Thus we can

obtain the singular expansion of I∗(s).

From the above remark and (2.23), we know that H(s) has a double pole at s = 1

and simple poles at s = 1 + χk, where χk = 2kπi/L (k ∈ Z) with L = log 2. Applying the

inversion formula H(t) = 1 2πi Z 3/2+i∞ 3/2−i∞ H∗(s)t−sds,

(47)

we have the asymptotic expansion of H(t) as t → 0 (the remainder term is due to a simple pole at s = −1) [8] H(t) = −1 Lt −1 log t + 1 LJ 0 (0) + 1 2t −1 + 1 L X k6=0 I∗(χk)t−1−χk + 2b + O(t), (2.27) where J0(0) = Z 1 0 1 φ(t)b − 1 t−1dt + Z ∞ 1 t−1 φ(t)b dt. (2.28)

Remark. First we rewrite J0(0) = Z 1 0 1 φ(t)b − 1 t−1dt + Z ∞ 1 t−1 φ(t)b dt = − b Z ∞ 0 φ(t)−bφ 0_(t) φ(t) log t dt, then J0(0) ∼ −2b Z ∞ 0 e−2btlog t dt = 2b d ds(2b) −s Γ(s)|s=1 = − log b − γ − L as b → ∞.

Equations (2.20) and (2.27) give ˆ F (t) = − 1 Lt −1 log t + 1 LJ 0 (0) + 1 2t −1 + 1 L X k6=0 I∗(χk)t−1−χk (2.29) − b Llog t + b LJ 0 (0) + 5b 2 + b L X k6=0

I∗(χk)t−χk + O(t log t),

and by the elementary substitution (2.16) we obtain the asymptotics of F (z). Finally,

using Theorem 9we obtain the following theorem for the mean of symmetric b-DSTs.

Theorem 13 (Hubalek). The expected generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞

E[Ln] =n log2n + 1 LJ 0 (0) + 1 2+ γ L− 1 L + δ1(log2n) n + b log₂n +b LJ 0_{(0) +} 5b 2 + bγ L − 1 2L + δ2(log2n) + O(log n n ),

where L = log 2, γ denotes Euler’s constant, J0(0) is defined in (2.28), δ1(x) and δ2(x)

(48)

Variance. _{To compute the variance, we use the formula V[L}n] = sn − fn2 + fn where

sn= Fn00(1) as for the classical symmetric DST. Then,

sn+b = n22−n n X k=0 n k fk+ n(n − 1) + 21−n n X k=0 n k fkfn−k + 21−n n X k=0 n k sk

with s0 = s1 = · · · = sb−1= 0. We again split the above recurrence into three components,

sn= un+ vn+ wn, where un+b =n22−n n X k=0 n k fk+ 21−n n X k=0 n k uk, u0 = · · · = ub−1= 0; (2.30a) vn+b =n(n − 1) + 21−n n X k=0 n k vk, v0 = · · · = vb−1= 0; (2.30b) wn+b =21−n n X k=0 n k fkfn−k+ 21−n n X k=0 n k wk, w0 = · · · = wb−1 = 0. (2.30c)

Applying Lemma 8to (2.30a)–(2.30c) yields

(1 + z)bU (z) =4ze b+1Fe z 2 + 2zb+1Fe0 z 2 + 2zb+2Fe0 z 2 + 2zbUe z 2 ; (2.31a) (1 + z)bV (z) =2ze b+2+ 2zbVe z 2 ; (2.31b) (1 + z)bfW (z) =2zbM (z) + 2zf bfW z 2 ; (2.31c) where e mn= [zn] fM (z) = 2−n n X k=0 n k e fkfe_n−k (n > 0). (2.31d)

Now we again apply Remark 1 to (2.31a) to obtain the expression for ˆU (t) with

ˆ

P (t) = 4t−1F (2t) − 8 ˆˆ F0(2t) − 8t ˆF0(2t)

as (2.18). Next, let Υ(t) = ˆP (t)/φ(t)b_{. From the derivative of ˆ}_{F (2t) = φ(t)}b_{H(2t), we get}

Υ(t) =4t−1H(2t) − 4b T (t) − 2H(2t) − 8bH(2t) − 8H0(2t) − 4btT (t)H(2t) − 8tH0(2t), where T (x) = φ0(x)/φ(x) =P j≥02 −j_{/(1 + 2}j_{x) and Φ∗} U(s) = Υ∗(s)/(1 − 21−s). Since, T (x) = ( 2 + O(x), x → 0, O x−1_, _{x → ∞,}

(49)

then T (x) is a harmonic sum with Mellin transform

T ∗(s) = 1

1 − 2s−1

π

sin πs (s ∈ h0, 1i),

and M[T (x) − 2; s] = T ∗(s) for s ∈ h−1, 0i. The Mellin transform of Υ(t) is Υ∗(s) = s23−sH∗(s − 1) − 4bΥ∗₀(s) − b23−sH∗(s) − 4bΥ∗₁(s) + s22−sH∗(s)

for s ∈ h2, ∞i, where Υ∗₀(s) = M[(T (t) − 2)H(2t); s] and Υ∗₁(s) = M[T (t)H(2t); s] exist

for s ∈ h0, ∞i. For asymptotic analysis of Φ∗(s), we have to take Υ∗₀(1) and Υ∗₁(1) into

account. One of the innovations in [8] is the use of the Mellin convolution formula.

Remark. The Mellin’s convolution formula is

M[F (t) · G(t); s] = 1

2πi

Z c+i∞

c−i∞

F ∗(τ ) · G∗(s − τ ) dτ, (2.32)

valid for c and s − c in the fundamental strip of F ∗ and G∗, respectively. From (2.32), we obtain for j = 0, 1 respectively,

Υ∗_j(s) = 1 2πi

Z 1/2+i∞

1/2−i∞

T ∗(τ + j) · 2−(s−τ )H∗(s − τ ) dτ.

First we compute Υ∗₀(1) by splitting

T ∗(τ )2−(1−τ )H∗(1 − τ ) = π

sin πτ

2τ −1

(1 − 2τ −1_{)(1 − 2}τ₎I∗(0 − τ )

= − T ∗(τ + 1)I∗(0 − τ ) − T ∗(τ )I∗(0 − τ ). Then the first part is

1 2πi Z 1/2+i∞ 1/2−i∞ T ∗(τ + 1)I∗(0 − τ ) dτ =M[tT (t)I(t); s = 0] = −1 bM[I 0 (t); s = 0] = 1 b, and the second part yields

1 2πi Z 1/2+i∞ 1/2−i∞ T ∗(τ )I∗(0 − τ ) dτ =M[(T (t) − 2)I(t); s = 0] =lim s→0 n − 1 bM[T (t)I(t); s] − 2I∗(s) o =1 bJ 0 (−1) − 2J0(0) − 2.

(50)

Thus Υ∗₀(1) = −1_b−1 bJ

0_{(−1) + 2J}0_{(0) + 2. It is more difficult to compute Υ∗}

1(1), for which

it can be proved that

Υ∗₁(1) = − 1 4b Z ∞ 0 t−1Λ(t) φ(t)b dt ∼ −2 (b → ∞) with Λ(t) = 2P j≥0j2 −j

t/(1 + 2−jt). Thus, we can manipulate the expansion for U (z) as

z → 1 similar as for F (z) and get the asymptotics of un as n → ∞.

The asymptotic of vn is simple. Again applying Remark 1to (2.31b), we obtain ˆV (t)

with ˆP (t) = 2t−2 and Φ∗_V(s) = 2I∗(s − 2)/(1 − 21−s). We immediately get the asymptotics

of vn as n → ∞ from the properties of I∗(s).

Because of the appearance of the “binomial convolution” (2.31d), it is non-trivial

to apply the same method to (2.31c). But, since the exponential generating function

e

m(z) =P

N ≥0meNz

N_{/N ! satisfies}

e

m(z) = ef (z/2)2_{, it can be proved that}

ˆ M ∗(s) = 2−s· 1 2πi Z 3/2+i∞ 3/2−i∞ s τ ˆ F ∗(τ ) ˆF ∗(s − τ ) dτ, (2.33)

where s_τ = Γ(1 + s)/Γ(1 + τ )Γ(1 + s − τ ) is the complex binomial coefficient. Next, from

the singular expansions of ˆF and the Taylor series of complex binomial coefficients, we

obtain the asymptotics of ˆM ∗(s) as s → 2. Similarly, one treats the case s → 1.

From (2.31c) we have ˆW (t) = φ(t/2)b_Φ W(t), where ΦW(t) = 2 P j≥02 j_{P (2}_ˆ j_{t) with} ˆ

P (t) = ˆM (t)I(t). Presupposing some properties of ˆP , then Φ∗(s) = 2 ˆP ∗(s)/(1 − 21−s₎

where ˆ P ∗(s) = 1 2πi Z 1/2+i∞ 1/2−i∞ I∗(τ ) ˆM ∗(s − τ ) dτ,

for s ∈ h5₂, 2b + 5₂i. Now shifting the contour to the left yields the analytic continuation

ˆ P ∗(s) = ˆM ∗(s) − 2b ˆM ∗(s + 1) + 1 2πi Z −3/2+i∞ −3/2−i∞ I∗(τ ) ˆM ∗(s − τ ) dτ.

in s ∈ h1₂, 2b + 1₂i. Thus we get the Laurent series of ˆP ∗(s) as s → 2 and s → 1. After

hard calculating, we obtain the asymptotics of wn. Overall, the following theorem for the

variance of the internal path length over a b-DST holds:

Theorem 14 (Hubalek). The variance of the generalized internal path length of a b-digital search tree built from n records satisfies as n → ∞,

V[Ln] =

數位搜尋樹的機率演算分析

國

立

交

通

大

學

應用數學系

碩

士

論

文

數位搜尋樹的機率演算分析

Probabilistic Analysis of Digital Search Trees

– Old and New Results

研 究 生：曾柏翰

指導教授：符麥可 教授

數位搜尋樹的機率演算分析

Probabilistic Analysis of Digital Search Trees

– Old and New Results

研 究 生：曾柏翰 Student：Po-Han Tseng

指導教授：符麥可 Advisor：Michael Fuchs

國 立 交 通 大 學

應 用 數 學 系

碩 士 論 文

Master Thesis

Probabilistic Analysis of Digital

Search Trees - Old and New Results

Po-Han Tseng

Department of Applied Mathematics,

National Chiao Tung University

摘 要

數位搜尋樹(digital search trees, DSTs for short)與桶型數位搜

尋樹(bucket DSTs,每個的節點最多可儲存 b 筆資料,b-DSTs for short)為

電腦科學中基本的資料結構。這兩種資料結構由具有 0-1 數列的儲存資料

所組成。此篇論文中，我們考慮隨機生成的 DSTs。

在這十年來，幾乎所有關於隨機 DSTs 的重要參數(parameters)都有研

究 結 果 出 現 。 如 ： 深 度 (Depth) ， 距 離 (Distance) ， 外 部 - 內 部 節 點

(External-internal nodes)，內節點路徑長度(Internal path length)和

大小(Size)。這些研究結果中有用到許多的分析方法，其中最主要都在解

析組合的範疇內。

在此論文中，我們主要著重於探討 DSTs 的內節點路徑長度。我們將介

紹近年發展出來之研究結果，與其使用到之分析技術。除此之外，還會介

紹一個全新的方法，此法將由 Fuchs、Hwang 和 Zacharovas 在以後的研究

中發表。此方法將會改進對 b-DST 上內節點路徑長度的分析。

這份論文的主要目地有兩個：第一，我們給出近年來關於 DSTs 之內節

點路徑長度的分析方法與研究結果，和其他參數的研究結果整理。此外我

們也給了一些分析技術上的改進。第二，我們提出一個全新的分析方法，

也得到一個對於 b-DSTs 上內節點路徑長度更加簡單的結果。

二章為內節點路徑長度與其他參數的期望值(mean)與變異數(variance)之

研究結果整理。第三章中介紹新的方法並給出我們的主要結果。

Preface

誌 謝

首先，我最想要感謝的人，就是我的指導老師 Dr. Michael Fuchs。從

這篇論文開始動工前，Michael 與我花了很多時間研讀很多關於 Digital

Search Trees 的論文，慢慢的搞清楚到目前為止 DSTs 的研究結果與其中所

使用的方法。除此之外，在這篇論文的撰寫中，Michael 也給了我相當大的

協助與指導，讓這篇論文能夠達到讓初次接觸 DSTs 或是這個領域的人有系

統的學習。

另外，我也很感謝我的兩位口試委員，交大的陳秋媛教授與海大的程

華淮教授。他們兩位都給了我關於這篇論文許多的意見，讓這篇論文能更

加完善與嚴謹。

我當然沒有忘記 96 應數所的同學們，哈哈。雖然研究所的日子很難熬，

但我們還是一起走過來了。該畢業的終究會畢業，還沒畢業的，那只是老

闆還沒點頭而已，加油！

還有，不可或缺的，就是在我生命中給我最多也最豐富的家人們。讓

我在新竹用功時(雖然有時會偷懶)，還是給予我最大的支持。喔對，還有

我家那兩隻可愛的小狗，奶雞與都胖(雖然他們什麼都不懂，老是想著吃還

有去公園)。

最後，永遠別忘記那些曾出現在生命中的美麗。

Contents

List of Figures

List of Tables

Chapter 1

Some techniques

1.1

Rice Method

1.2

Mellin Transform

1.3

研究生：曾柏翰

指導教授：符麥可教授

研究生：曾柏翰 Student：Po-Han Tseng

國立交通大學

應用數學系

碩士論文

摘要

究結果出現。如：深度 (Depth) ，距離 (Distance) ，外部 - 內部節點

誌謝