• 沒有找到結果。

A lower bound for 1-median selection in metric spaces

N/A
N/A
Protected

Academic year: 2022

Share "A lower bound for 1-median selection in metric spaces"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

A lower bound for 1-median selection in metric spaces

Ching-Lueh Chang

Department of Computer Science and Engineering Yuan Ze University, Taoyuan, Taiwan

clchang@saturn.yzu.edu.tw

Abstract

Consider the problem of finding a point in an n-point metric space with the minimum average distance to all points. We show that this problem has no deterministic nonadaptive o(n2)-query (4−

Ω(1))-approximation algorithms.

1 Introduction

Given oracle access to a metric space ({1, 2, . . . , n}, d), the metric 1-median problem asks for a point with the minimum average dis- tance to all points. Indyk [8, 9] shows that metric 1-median has a Monte-Carlo O(n/2)-time (1+)- approximation algorithm with an Ω(1) probability of success. The more general metric k-median problem asks for x1, x2, . . ., xk ∈ {1, 2, . . . , n}

minimizing P

x∈{1,2,...,n} minki=1 d(xi, x). Ran- domized as well as evasive algorithms are well- studied for metric k-median and the related k- means problem [1, 4, 7, 10–12], where k ≥ 1 is part of the input rather than a constant.

This paper focuses on deterministic sublinear- query algorithms for metric 1-median. Guha et al. [7, Sec. 3.1–3.2] prove that metric k- median has a deterministic O(n1+)-time O(n)- space 2O(1/)-approximation algorithm that reads distances in a single pass, where  > 0. Chang [3]

presents a deterministic nonadaptive O(n1.5)-time 4-approximation algorithm for metric 1-median.

He also shows that metric 1-median has no de- terministic o(n2)-query (3 − )-approximation al- gorithms for any constant  > 0 [2]. Modifying a proof of Chang [2, Sec. 5], this paper shows that metric 1-median has no deterministic nonadap- tive o(n2)-query (4 − )-approximation algorithms for any constant  > 0.

In social network analysis, the importance of an actor in a network may be quantified by sev- eral centrality measures, among which the close- ness centrality of an actor is defined to be its av-

erage distance to other actors [13]. So metric 1- median can be interpreted as the problem of find- ing the most important point in a metric space.

Goldreich and Ron [6] and Eppstein and Wang [5]

present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.

2 Definitions

For n ∈ Z+, denote [n] ≡ {1, 2, . . . , n}. An n- point metric space ([n], d) is the set [n] endowed with a function d : [n] × [n] → R satisfying

1. d(x, y) ≥ 0 (non-negativeness),

2. d(x, y) = 0 if and only if x = y (identity of indiscernibles),

3. d(x, y) = d(y, x) (symmetry), and

4. d(x, y)+d(x, z) ≥ d(y, z) (triangle inequality) for all x, y, z ∈ [n]. An equivalent definition re- quires the triangle inequality only for distinct x, y, z ∈ [n], axioms 1–3 remaining.

An algorithm with oracle access to a metric space ([n], d) is given n and may query d on any (x, y) ∈ [n] × [n] to obtain d(x, y). Without loss of generality, we forbid queries for d(x, x), which triv- ially returns 0, as well as repeated queries, where queries for d(x, y) and d(y, x) are seen as identi- cal. An algorithm is said to be nonadaptive if its (k + 1)-th query (which is a pair in [n] × [n]) is independent of the answers to the first k queries, where k ∈ N. Consequently, a deterministic non- adaptive algorithm’s set of queries depends only on n and the algorithm itself but not on d. For convenience, denote an algorithm ALG with ora- cle access to ([n], d) by ALGd.

Given oracle access to a finite metric space ([n], d), the metric 1-median problem asks for a point in [n] with the minimum average distance

(2)

to all points. An algorithm for this problem is α- approximate if it outputs a point x ∈ [n] satisfying

X

y∈[n]

d (x, y) ≤ α min

x0∈[n]

X

y∈[n]

d (x0, y) ,

where α ≥ 1.

The following theorem is due to Chang [3].

Theorem 1 ([3]). Metric 1-median has a deterministic nonadaptive O(n1.5)-time 4- approximation algorithm.

3 Lower bound

Let A be any deterministic nonadaptive o(n2)- query algorithm for metric 1-median,  ∈ (0, 0.1) be a constant and d : [n] × [n] → R be a metric to be determined later.

Define

Q ≡n

unordered pair (x, y) ∈ [n]2| Adever queries for d (x, y)o

to be the set of queries of Adtreated as unordered pairs. Without loss of generality, assume (x, x) /∈ Q for all x ∈ [n]. Let G = ([n], Q) be the simple undirected graph with vertex set [n] and edge set Q. Denote the degree of x ∈ [n] in G by degG(x) =

|{y ∈ [n] | (x, y) ∈ Q}|, and

B ≡ {x ∈ [n] | degG(x) ≥ n} . (1) In the sequel, we will specify d incrementally in several steps. Note that Q and B are independent of d because of the nonadaptivity of A; hence they will remain intact during our specification of d.

Below is an easy lemma.

Lemma 2 (Implicit in [2]). | B | = o(n).

Henceforth we will assume n ∈ Z+ to be suffi- ciently large so that

n − | B | − 1 − n > 0 (2) by Lemma 2. For all x ∈ [n],

d (x, x) ≡ 0. (3)

For all (x, y) ∈ [n]2\ {(x, x) | x ∈ [n]} with x ∈ B, y ∈ B or (x, y) ∈ Q,

d(x, y) ≡

 4, if x ∈ B or y ∈ B;

2, otherwise. (4)

Clearly, this does not assign different values to d(x, y) and d(y, x).

As Eq. (4) specifies d on a superset of Q (which is the set of A’s queries) and A is deterministic, the output of Ad has now been fixed even though d is not fully specified yet. Let p ∈ [n] \ B and p0∈ B be such that {p, p0} contains the output of Ad.

Lemma 3.

| ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (p, x) /∈ Q} | ≥ n−| B ∪ {p} |−n.

Proof. Eq. (1) and p /∈ B imply degG(p) < n, i.e.,

| {x ∈ [n] | (p, x) ∈ Q} | < n.

Take ˆ

p ∈ ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (x, p) /∈ Q} (5) arbitrarily, as can be done by Lemma 3 and Eq. (2). Trivially, ˆp /∈ B.

We now complete the specification of d. For all (x, y) ∈ [n]2\ (Q ∪ {(x, x) | x ∈ [n]}) with x /∈ B and y /∈ B,1

d(x, y) ≡

3, if ((x = ˆp) ∧ (y = p)) or ((y = ˆp) ∧ (x = p));

1, if ((x = ˆp) ∧ (y 6= p)) or ((y = ˆp) ∧ (x 6= p));

4, if ((x = p) ∧ (y 6= ˆp)) or ((y = p) ∧ (x 6= ˆp));

2, otherwise.

(6)

The four cases in Eq. (6) are mutually exclusive because p 6= ˆp by Eq. (5). Clearly, Eq. (6) does not assign different values to d(x, y) and d(y, x).

The following lemma is straightforward from Eqs. (3)–(4) and (6).

Lemma 4. For all x, y ∈ [n], d(x, y) ∈ {0, 1, 2, 3, 4}.

Lemma 5.

X

y∈[n]

d (ˆp, y) ≤ (1 + 4) n + o(n).

Proof. By Lemmas 2 and 4, X

y∈B

d (ˆp, y) = o(n). (7)

Furthermore,

X

y∈[n] s.t. ( ˆp,y)∈Q

d (ˆp, y)

Lemma 4

≤ X

y∈[n] s.t. ( ˆp,y)∈Q

4

= 4 degG(ˆp)

< 4n, (8)

1These are precisely the pairs whose d-distances are not specified by Eqs. (3)–(4).

(3)

where the last inequality follows from Eq. (1) and ˆ

p /∈ B. We have X

y∈[n]\(B∪{p, ˆp}) s.t. ( ˆp,y) /∈Q

d (ˆp, y) ≤ n (9)

because all summands are 1 by Eq. (6) and ˆp /∈ B.

By Lemma 4, X

y∈{p, ˆp}

d (ˆp, y) = O(1). (10)

Summing up Eqs. (7)–(10) completes the proof.

Lemma 6.

X

y∈[n]

d (p, y) ≥ 4 (n − o(n) − n) .

Proof. Recall that p /∈ B. We have X

y∈[n]

d (p, y)

≥ X

y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q

d (p, y)

Eq. (6)

= X

y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q

4

≥ 4 (| {y ∈ [n] \ (B ∪ {p}) | (p, y) /∈ Q} | − 1)

Lemma 3

≥ 4 (n − | B | − n − 2)

Lemma 2

= 4 (n − o(n) − n) .

The next lemma is immediate from Eq. (4) and p0∈ B.

Lemma 7.

X

y∈[n]\{p0}

d (p0, y) = 4 (n − 1) .

We proceed to prove that ([n], d) is a metric space through a few lemmas.

Lemma 8. For all x, y ∈ [n], if d(x, y) = 1, then ˆ

p ∈ {x, y}.

Proof. Inspect Eq. (6), which is the only equation that may set distances to 1.

Below is a consequence of Lemma 8.

Lemma 9. For all distinct x, y, z ∈ [n], if d(x, y) = d(x, z) = 1, then x = ˆp.

The following lemma is immediate from Eqs. (4) and (6).

Lemma 10. For all x, y ∈ [n], if d(x, y) = 4, then {x, y} ∩ (B ∪ {p}) 6= ∅.

Below is a consequence of p /∈ B and Eqs. (5)–

(6).

Lemma 11. d(ˆp, p) = 3.

Lemma 12. ([n], d) is a metric space.

Proof. We only need to prove the triangle inequal- ity for d because all the other axioms are easy to verify. Consider the following cases for all distinct x, y, z ∈ [n]:

• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 4. By Lemma 9, x = ˆp. Hence if y = p (resp., z = p), then d(x, y) = 3 (resp., d(x, z) = 3) by Lemma 11, a contradiction. Therefore, p /∈ {y, z}, which together with Lemma 10 forces {y, z} ∩ B 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contradiction.

• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 3.

By Lemma 9, x = ˆp. On the other hand, d(y, z) = 3 means (y, z) ∈ {(ˆp, p), (p, ˆp)} by Eq. (6) (which is the only equation that may set distances to 3), contradicting x = ˆp.

• d(x, y) = 1, d(x, z) = 2 and d(y, z) = 4.

By Lemma 10, {y, z} ∩ (B ∪ {p}) 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contra- diction. Therefore, p ∈ {y, z}. Further- more, ˆp ∈ {x, y} by Lemma 8. Conse- quently, (p, ˆp) ∈ {(x, y), (x, z), (y, z)} (note that p 6= p by Eq. (5)), implying 3 ∈ˆ {d(x, y), d(x, z), d(y, z)} by Lemma 11, a con- tradiction.

We have excluded all possibilities of d(x, y) + d(x, z) < d(y, z), where x, y, z ∈ [n].

Combining Lemmas 5–7, 12 and that {p, p0} contains the output of Ad yields our main theo- rem.

Theorem 13. Metric 1-median has no de- terministic nonadaptive o(n2)-query (4 − )- approximation algorithms for any constant  > 0.

Theorem 13 shows that the approximation ra- tio of 4 in Theorem 1 cannot be improved to any constant c < 4.

(4)

Acknowledgment

The author is supported in part by the National Science Council of Taiwan under grant 101-2221- E-155-015-MY2.

References

[1] V. Arya, N. Garg, R. Khandekar, A. Mey- erson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility lo- cation problems. SIAM Journal on Comput- ing, 33(3):544–562, 2004.

[2] C.-L. Chang. Some results on approximate 1- median selection in metric spaces. Theoretical Computer Science, 426:1–12, 2012.

[3] C.-L. Chang. Deterministic sublinear-time approximations for metric 1-median selection.

Information Processing Letters, 113(8):288–

292, 2013.

[4] K. Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.

[5] D. Eppstein and J. Wang. Fast approxima- tion of centrality. Journal of Graph Algo- rithms and Applications, 8(1):39–45, 2004.

[6] O. Goldreich and D. Ron. Approximating av- erage parameters of graphs. Random Struc- tures & Algorithms, 32(4):473–493, 2008.

[7] S. Guha, A. Meyerson, N. Mishra, R. Mot- wani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Trans- actions on Knowledge and Data Engineering, 15(3):515–528, 2003.

[8] P. Indyk. Sublinear time algorithms for met- ric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Com- puting, pages 428–434, 1999.

[9] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.

[10] R. Jaiswal, A. Kumar, and S. Sen. A simple D2-sampling based PTAS for k-means and other clustering problems. In Proceedings of the 18th Annual International Conference on Computing and Combinatorics, pages 13–24, 2012.

[11] A. Kumar, Y. Sabharwal, and S. Sen. Linear- time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.

[12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.

[13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cam- bridge University Press, 1994.

參考文獻

相關文件

[r]

Theorem In a compact metric space X, every sequence contains a subsequence converging to a point in X.. Corollary Every bounded sequence in R k contains a

[Hint: You may find the following fact useful.. If d is a metric for the topology of X, show that d|A × A is a metric for

In this homework, you are asked to implement k-d tree for the k = 1 case, and the data structure should support the operations of querying the nearest point, point insertion, and

Motivated by recent work of Hajela, we here reconsider the problem of determining the minimum distance between output sequences of an ideal band-limiting channel,

On the other hand, when N is an even number, for a point to go back to its starting position after several steps of symmetry, the given N points must satisfy certain condition..

We would like to point out that unlike the pure potential case considered in [RW19], here, in order to guarantee the bulk decay of ˜u, we also need the boundary decay of ∇u due to

The cross-section is a hexagon, and the shape of the solid looks like the union of two umbrellas..