Wiener Index for Variants of Digital Search Trees

4.1 Approximate Counting

4.2.3 Wiener Index for Variants of Digital Search Trees

T_n− E(Tn)

√Var(T_n),W_n− E(Wn)

√Var(W_n) )

this proves our claim.

4.2.3 Wiener Index for Variants of Digital Search Trees

In this section, we are going to discuss similar results as in Section 4.2.2 for other digital trees. Proofs of these results follow along the same lines (or are even easier since in some cases Laplace transform is not needed) and will not be given). For the reader’s convenience, we will list the (diﬀerential-)functional equations for poissonized mean, variances and covariances which are crucial to the proofs in the AppendixA. Our results can be deduced from them with a similar approach as used in Section 4.2.2.

The ﬁrst member of the digital tree family we are going to discuss is the bucket digital search trees. Note that there are two types of total path length in bucket digital trees: the sum of distances of all keys to the root and the sum of distances of all nodes to the root; the former is called key-wise path length and the latter node-wise path length (see [74] for more details).

Accordingly, we also have a key-wise Wiener index and a node-wise Wiener

index. Results for both Wiener indices in random bucket digital search trees will be presented below.

Another member of the digital tree family are tries. Note that for tries, the number of leaves is n whereas the number of internal nodes is random.

Hence, there are again two diﬀerent types of Wiener indices, namely, the ex-ternal Wiener index which only uses exex-ternal nodes and the inex-ternal Wiener index where internal nodes are used. Again both of these Wiener indices will be discussed below.

As a ﬁnal member of the digital tree family, we consider PATRICIA tries.

For binary PATRICIA tries, the number of internal nodes is not random and hence there is only external Wiener index which make sense. However, for m-ary PATRICIA tries with m > 2, the number of internal nodes is no longer deﬁnite and hence the internal Wiener index is well-deﬁned. We will give the results of internal Wiener index for m-ary PATRICIA tries in the end of this section.

As in Section 4.2.2, we will denote by T_n the total path length (either key-wise or node-wise or external or internal depending on the context) and by W_n the Wiener index (again either key-wise or node-wise or external or internal). Moreover, for the node-wise Wiener index and the internal Wiener index, we also need the number of nodes (internal in case of the internal Wiener index) which will be denoted by N_n.

Key-wise Wiener Index of Bucket Digital Search Trees.

Here, we have the following distributional recurrences for T_n and W_n: for n≥ 0,

Tn+b

=Td Bn+ T_n^∗_−B_n + n,

W_n+b=W^d _B_n + W_n^∗_−B_n+ (B_n+ 1)(T_n^∗_−B_n + n− Bn) + (n− Bn+ 1)(T_B_n+ B_n),

where notation is as in Section 1 and initial conditions are given by T₀ =

· · · = Tb−1 = W₀ =· · · = Wb−1 = 0.

From these recurrences, we obtain the following results for mean and variance.

Theorem 4.2.5. We have for the mean of the key-wise path length and

key-wise Wiener index of bucket digital search trees,

E(Tn) = n log₂n + nP1(log₂n) +O(log n),

E(Wn) = n²log₂n + n²P₁(log₂n)− n²+O(n log n),

where P₁(z) is a one-periodic function given in the remark below. Moreover, variances and covariances of the key-wise path length and key-wise Wiener index of bucket digital search trees are given by

Var(T_n) = nP₂(log₂n) +O(1), Cov(T_n, W_n) = n²P₂(log₂n) +O(n log n),

Var(W_n) = n³P₂(log₂n) +O(n²log n),

where P2(z) is again a one-periodic function given in the remark below.

Remark 8. The result for the mean and variance of the key-wise path length were ﬁrst obtained by Hubalek in [89]. In [74], we gave the following expres-sions for the periodic functions

P₁(z) = γ− 1

and ˜f1,0(z) denotes the Poisson generating function of E(Tn).

Note that the result for the mean of the Wiener index also follows from [22].

Moreover, we have the following bivariate central limit theorem.

Theorem 4.2.6. We have,

(

T_n− E(Tn)

√Var(T_n),W_n− E(Wn)

√Var(W_n) )

−→ (X, X),d

where X is a standard normal distributed random variable and −→ denotes^d weak convergence.

Remark 9. The central limit theorem for the key-wise path length was ﬁrst proved in [90].

Node-wise Wiener Index of Bucket Digital Search Trees.

Here, the distributional recurrences for N_n, T_n and W_n are given by: for n≥ 0,

N_n+b =N^d _B_n+ N_n^∗_−B

n+ 1,

T_n+b =T^d _B_n+ T_n^∗_−B_n+ N_B_n + N_n^∗_−B_n, Wn+b

=Wd Bn + W_n^∗_−B_n + (NBn+ 1)(T_n^∗_−B_n+ N_n^∗_−B_n) + (N_n^∗_−B_n+ 1)(T_B_n + N_B_n),

where Bn is as in Section 1, the triplet (N_n^∗, T_n^∗, W_n^∗) denotes an independent copy of (N_n, T_n, W_n) and (N_n, T_n, W_n) is independent of (B_n). Initial con-ditions are given by T₀ = · · · = Tb−1 = W₀ = · · · = Wb−1 = N₀ = 0 and N1 =· · · = Nb−1= 1.

From this, we obtain the following result.

Theorem 4.2.7. We have for the mean of the number of nodes, node-wise

path length and node-wise Wiener index of bucket digital search trees,

E(Nn) = nP₁(log₂n) +O(1),

E(Tn) = n(log₂n)P₁(log₂n) +O(n), E(Wn) = n²(log₂n)P₁(log₂n)²+O(n²),

where P₁(z) is a one-periodic function given in the remark below. Moreover, variances and covariances of the number of nodes, node-wise path length and node-wise Wiener index of bucket digital search trees are given by

Var(N_n) = nP₂(log₂n) +O(1),

Cov(N_n, T_n) = n(log₂n)P₂(log₂n) +O(n), Var(T_n) = n(log₂n)²P₂(log₂n) +O(n log n),

Cov(N_n, W_n) = 2n²(log₂n)P₁(log₂n)P₂(log₂n) +O(n²), Cov(Tn, Wn) = 2n²(log₂n)²P1(log₂n)P2(log₂n) +O(n²log n),

Var(W_n) = 4n³(log₂n)²P₁(log₂n)²P₂(log₂n) +O(n³log n),

where P₂(z) is again a one-periodic function given in the remark below.

Remark 10. The results for the number of nodes were ﬁrst proved in [90].

Moreover, the results were reproved in [74] where in addition we also proved the results for the node-wise path length and gave the following expressions for P₁(z) and P₂(z)

and ˜f_1,0(z) denotes the Poisson generating function of E(Tn).

Theorem 4.2.7 yields the following trivariate central limit theorem.

Theorem 4.2.8. We have,

(

where X is a standard normal distributed random variable and −→ denotes^d weak convergence.

Remark 11. The central limit theorem for the number of nodes was ﬁrst proved in [90]. Also note that we posed the problem of proving a bivariate central limit law of number of nodes and node-wise path length in Section 5 of [74].

External Wiener Index of Tries. Here, the distributional recurrences

for T_n and W_n are as follows: for n≥ 2,

= Td Bn+ T_n^∗_−B_n + n,

W_n= W^d _B_n + W_n^∗_−B_n+ B_n(T_n^∗_−B_n+ n− Bn) + (n− Bn)(T_B_n + B_n), where notation is as in Section 1 and initial conditions are given by T₀ = T₁ = W₀ = W₁ = 0.

From this, we obtain the following theorem.

Theorem 4.2.9. We have for the mean of external path length and external

Wiener index of tries,

E(Tn) = n log₂n + nP₁(log₂n) +O(log n),

E(Wn) = n²log₂n + n²P₁(log₂n)− n²+O(n log n),

where P₁(z) is a one-periodic function given in the remark below. Moreover, variances and covariances of the external path length and external Wiener index of tries are given by

Var(T_n) = nP₂(log₂n) +O(1), Cov(T_n, W_n) = n²P₂(log₂n) +O(n log n),

Var(W_n) = n³P₂(log₂n) +O(n²log n),

where P₂(z) is again a one-periodic function given in the remark below.

Remark 12. The result about the mean of the total path length was ﬁrst obtained in [127]. A detailed analysis of the variance of the total path length was ﬁrst undertaken by Kirschenhofer, Prodinger and Szpankowski [119] (see also Jacquet and Régnier [94] for preliminary results). In Hwang, Fuchs and Zacharovas [75], we obtained the following expressions for the periodic functions

P₁(z) = γ log 2 +1

2 − 1 log 2

∑

k̸=0

Γ(−χk)e^2kπiz and

P₂(z) = 1 log 2

∑

G₂(−1 − χk)e^2kπiz, where

G₂(ω) =Γ(ω + 1) (

1− ω²+ ω + 4 2^ω+3

)

+ 2∑

l≥1

(−1)^lΓ(ω + l + 1)

l!(2^l− 1) (l(ω + l)− 1).

Note that the result about the mean of the Wiener index also follows from [22].

From the previous result, we again obtain the following theorem.

Theorem 4.2.10. We have,

(

T_n− E(Tn)

√Var(T_n),W_n− E(Wn)

√Var(W_n) )

−→ (X, X),d

where X is a standard normal distributed random variable and −→ denotes^d weak convergence.

Remark 13. The central limit theorem for the key-wise path length was ﬁrst proved in [94].

Internal Wiener Index of Tries.

Here, the distributional recurrences for N_n, T_n and W_n are as follows: for n≥ 2,

=Nd Bn + N_n^∗_−B_n+ 1,

T_n=T^d _B_n+ T_n^∗_−B_n + N_B_n+ N_n^∗_−B_n,

W_n=W^d _B_n+ W_n^∗_−B_n+ (N_B_n + 1)(T_n^∗_−B_n + N_n^∗_−B_n) + (N_n−B^∗ _n + 1)(T_B_n+ N_B_n),

where notation is as for the node-wise Wiener index and initial conditions are given by N0 = N1 = T0 = T1 = W0 = W1 = 0.

Then, we have the following result for mean values, variances and covari-ances.

Theorem 4.2.11. We have for the mean of the number of internal nodes,

internal path length and internal Wiener index of tries,

E(Nn) = nP₁(log₂n) +O(1),

E(Tn) = n(log₂n)P₁(log₂n) +O(n), E(Wn) = n²(log₂n)P1(log₂n)²+O(n²),

where P1(z) is a one-periodic function given in the remark below. Moreover, variances and covariances of the number of internal nodes, internal path

length and internal Wiener index of tries are given by Var(N_n) = nP₂(log₂n) +O(1),

Cov(N_n, T_n) = n(log₂n)P₂(log₂n) +O(n), Var(T_n) = n(log₂n)²P₂(log₂n) +O(n log n),

Cov(Nn, Wn) = 2n²(log₂n)P1(log₂n)P2(log₂n) +O(n²), Cov(T_n, W_n) = 2n²(log₂n)²P₁(log₂n)P₂(log₂n) +O(n²log n),

Var(W_n) = 4n³(log₂n)²P₁(log₂n)²P₂(log₂n) +O(n³log n), where P2(z) is again a one-periodic function given in the remark below.

Remark 14. The result for the mean of the number of internal nodes was ﬁrst proved in [127]. The variance of the number of internal nodes was ﬁrst derived by Régnier and Jacquet [95] (see also [94], [93]). In [75], we gave the following expression for the periodic functions

P₁(z) = 1

log 2 + 1 log 2

∑

k̸=0

χ_kΓ(−1 − χk)e^2kπiz.

and

P₂(z) = 1 log 2

∑

G₂(−1 − χk)e^2kπiz, where

G₂(ω) =(ω + 1)Γ(ω) (

1− ω²+ 4ω + 8 2^ω+3

)

+ 2∑

l≥1

(−1)^llΓ(ω + l + 1)

(l + 1)!(2^l− 1) (l(ω + l + 1)− 1).

The results for mean and variance of internal path length and covariance with the number of internal nodes are due to Nguyen-The [162].

As before, we have a central limit theorem which now reads as follows.

Theorem 4.2.12. We have,

(

N_n− E(Nn)

√Var(N_n) ,T_n− E(Tn)

√Var(T_n),W_n− E(Wn)

√Var(W_n) )

−→ (X, X, X),d

where X is a standard normal distributed random variable and −→ denotes^d weak convergence.

Remark 15. The central limit theorem for the number of internal nodes was ﬁrst proved in [93] and [94]. The bivariate central limit theorem for the number of internal nodes and the internal path length was wrongly stated in [162] (the author of this work did not observe that the covariance matrix is singular leading to a wrong proof).

External Wiener Index of Binary PATRICIA tries. Here, we have

for T_n and W_n: for n≥ 2,

T_n =^d

{ T_B_n + T_n^∗_−B_n+ n, if B_n ̸= 0 or Bn̸= n;

T_n, otherwise,

W_n=^d





W_B_n+ W_n^∗_−B_n + B_n(T_n^∗_−B_n+ n− Bn)

+(n− Bn)(T_B_n+ B_n), if B_n ̸= 0 or Bn̸= n;

W_n, otherwise,

where notations is as in Section 1 and T₀ = T₁ = W₀ = W₁ = 0.

Then, we have the following result.

Theorem 4.2.13. We have for the mean of the total path length and Wiener

index of PATRICIA tries,

E(Tn) = n log₂n + nP₁(log₂n) +O(log n),

E(Wn) = n²log₂n + n²P1(log₂n)− n²+O(n log n),

where P₁(z) is a one-periodic function given in the remark below. More-over, variances and covariances of the total path length and Wiener index of PATRICIA tries are given by

Var(T_n) = nP₂(log₂n) +O(1), Cov(T_n, W_n) = n²P₂(log₂n) +O(n log n),

Var(W_n) = n³P₂(log₂n) +O(n²log n),

where P₂(z) is again a one-periodic function given in the remark below.

Remark 16. The result for the mean of the external path length was ﬁrst derived in [127]. The result for the variance of the total path length is due to Kirschenhofer, Prodinger and Szpankowski [118]. In [75], we obtained the expressions for the period functions

P₁(z) = γ− 1 log 2 + 1

log 2

∑

k̸=0

Γ(−χk)e^2kπiz

and

The latter result again implies the following bivariate central limit theo-rem.

Theorem 4.2.14. We have,

(

where X is a standard normal distributed random variable and −→ denotes^d weak convergence.

Remark 17. Up to our knowledge, this result was ﬁrst obtained by Neininger and Rüschendorf in [160].

Internal Wiener Index of m-ary PATRICIA tries.

First, observe that the internal path length and internal Wiener index satisfy the following distribution recurrences for n≥ 2

N_n =^d

Theorem 4.2.15. Consider m-ary PATRICIA tries built on strings with

digits from alphabetS = {a1, . . . , a_m}. Suppose that the probability for a digit of the random string being a_i is p_i for all 1≤ i ≤ m. Set h = −∑m

i=1p_ilog p_i, then we have that for the mean of internal nodes, internal path length and internal Wiener index of m-ary PATRICIA tries, as n→ ∞,

E(Nn)∼ nP1(log_1/an),

E(Tn)∼ h⁻¹n log nP₁(log_1/an), E(Wn)∼ h⁻¹n²log nP1(log_1/an)²,

where P₁(z) is a one-periodic function. Moreover, variances and covariances of the number of internal nodes, internal path length and internal Wiener index of m-ary PATRICIA tries are given by

Var(N_n)∼ nP2(log_1/an),

Cov(N_n, T_n)∼ h⁻¹n log nP₂(log_1/an), Var(T_n)∼ h⁻²n log²nP₂(log_1/an),

Cov(Nn, Wn)∼ 2h⁻¹n²log nP1(log_1/an)P2(log_1/an), Cov(T_n, W_n)∼ 2h⁻²n²log²nP₁(log_1/an)P₂(log_1/an),

Var(W_n)∼ 4h⁻²n³log²nP₁(log_1/an)²P₂(log_1/an), where Q(z) is again a one-periodic function. In particular,

ρ(N_n, T_n)−→ 0, ρ(Nn, W_n)−→ 0, ρ(Tn, W_n)−→ 0, where ρ(·, ·) denotes the correlation coeﬃcient.

Theorem 4.2.16. We have,

(

N_n− E(Nn)

√Var(N_n) ,T_n− E(Tn)

√Var(T_n),W_n− E(Wn)

√Var(W_n) )

−→ (X, X, X).d

在文檔中隨機數位樹上加法性參數之機率分析 (頁 90-100)