A lower bound for 1-median selection in metric spaces

(1)

A lower bound for 1-median selection in metric spaces

Ching-Lueh Chang

Department of Computer Science and Engineering Yuan Ze University, Taoyuan, Taiwan

clchang@saturn.yzu.edu.tw

Abstract

Consider the problem of finding a point in an n-point metric space with the minimum average distance to all points. We show that this problem has no deterministic nonadaptive o(n²)-query (4−

Ω(1))-approximation algorithms.

1 Introduction

Given oracle access to a metric space ({1, 2, . . . , n}, d), the metric 1-median problem asks for a point with the minimum average distance to all points. Indyk [8, 9] shows that metric 1-median has a Monte-Carlo O(n/²)-time (1+)- approximation algorithm with an Ω(1) probability of success. The more general metric k-median problem asks for x₁, x₂, . . ., x_k ∈ {1, 2, . . . , n}

minimizing P

x∈{1,2,...,n} min^k_i=1 d(xi, x). Ran- domized as well as evasive algorithms are well- studied for metric k-median and the related k- means problem [1, 4, 7, 10–12], where k ≥ 1 is part of the input rather than a constant.

This paper focuses on deterministic sublinear- query algorithms for metric 1-median. Guha et al. [7, Sec. 3.1–3.2] prove that metric k- median has a deterministic O(n¹⁺)-time O(n)- space 2^O(1/)-approximation algorithm that reads distances in a single pass, where > 0. Chang [3]

presents a deterministic nonadaptive O(n^1.5)-time 4-approximation algorithm for metric 1-median.

He also shows that metric 1-median has no deterministic o(n²)-query (3 − )-approximation algorithms for any constant > 0 [2]. Modifying a proof of Chang [2, Sec. 5], this paper shows that metric 1-median has no deterministic nonadaptive o(n²)-query (4 − )-approximation algorithms for any constant > 0.

In social network analysis, the importance of an actor in a network may be quantified by several centrality measures, among which the closeness centrality of an actor is defined to be its av-

erage distance to other actors [13]. So metric 1- median can be interpreted as the problem of finding the most important point in a metric space.

Goldreich and Ron [6] and Eppstein and Wang [5]

present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.

2 Definitions

For n ∈ Z⁺, denote [n] ≡ {1, 2, . . . , n}. An n- point metric space ([n], d) is the set [n] endowed with a function d : [n] × [n] → R satisfying

1. d(x, y) ≥ 0 (non-negativeness),

2. d(x, y) = 0 if and only if x = y (identity of indiscernibles),

3. d(x, y) = d(y, x) (symmetry), and

4. d(x, y)+d(x, z) ≥ d(y, z) (triangle inequality) for all x, y, z ∈ [n]. An equivalent definition re- quires the triangle inequality only for distinct x, y, z ∈ [n], axioms 1–3 remaining.

An algorithm with oracle access to a metric space ([n], d) is given n and may query d on any (x, y) ∈ [n] × [n] to obtain d(x, y). Without loss of generality, we forbid queries for d(x, x), which trivially returns 0, as well as repeated queries, where queries for d(x, y) and d(y, x) are seen as identi- cal. An algorithm is said to be nonadaptive if its (k + 1)-th query (which is a pair in [n] × [n]) is independent of the answers to the first k queries, where k ∈ N. Consequently, a deterministic nonadaptive algorithm’s set of queries depends only on n and the algorithm itself but not on d. For convenience, denote an algorithm ALG with oracle access to ([n], d) by ALG^d.

Given oracle access to a finite metric space ([n], d), the metric 1-median problem asks for a point in [n] with the minimum average distance

(2)

to all points. An algorithm for this problem is α- approximate if it outputs a point x ∈ [n] satisfying

X

y∈[n]

d (x, y) ≤ α min

x⁰∈[n]

X

y∈[n]

d (x⁰, y) ,

where α ≥ 1.

The following theorem is due to Chang [3].

Theorem 1 ([3]). Metric 1-median has a deterministic nonadaptive O(n^1.5)-time 4- approximation algorithm.

3 Lower bound

Let A be any deterministic nonadaptive o(n²)- query algorithm for metric 1-median, ∈ (0, 0.1) be a constant and d : [n] × [n] → R be a metric to be determined later.

Define

Q ≡n

unordered pair (x, y) ∈ [n]²| A^dever queries for d (x, y)o

to be the set of queries of A^dtreated as unordered pairs. Without loss of generality, assume (x, x) /∈ Q for all x ∈ [n]. Let G = ([n], Q) be the simple undirected graph with vertex set [n] and edge set Q. Denote the degree of x ∈ [n] in G by deg_G(x) =

|{y ∈ [n] | (x, y) ∈ Q}|, and

B ≡ {x ∈ [n] | deg_G(x) ≥ n} . (1) In the sequel, we will specify d incrementally in several steps. Note that Q and B are independent of d because of the nonadaptivity of A; hence they will remain intact during our specification of d.

Below is an easy lemma.

Lemma 2 (Implicit in [2]). | B | = o(n).

Henceforth we will assume n ∈ Z⁺ to be suffi- ciently large so that

n − | B | − 1 − n > 0 (2) by Lemma 2. For all x ∈ [n],

d (x, x) ≡ 0. (3)

For all (x, y) ∈ [n]²\ {(x, x) | x ∈ [n]} with x ∈ B, y ∈ B or (x, y) ∈ Q,

d(x, y) ≡

4, if x ∈ B or y ∈ B;

2, otherwise. (4)

Clearly, this does not assign different values to d(x, y) and d(y, x).

As Eq. (4) specifies d on a superset of Q (which is the set of A’s queries) and A is deterministic, the output of A^d has now been fixed even though d is not fully specified yet. Let p ∈ [n] \ B and p⁰∈ B be such that {p, p⁰} contains the output of A^d.

Lemma 3.

| ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (p, x) /∈ Q} | ≥ n−| B ∪ {p} |−n.

Proof. Eq. (1) and p /∈ B imply deg_G(p) < n, i.e.,

| {x ∈ [n] | (p, x) ∈ Q} | < n.

Take ˆ

p ∈ ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (x, p) /∈ Q} (5) arbitrarily, as can be done by Lemma 3 and Eq. (2). Trivially, ˆp /∈ B.

We now complete the specification of d. For all (x, y) ∈ [n]²\ (Q ∪ {(x, x) | x ∈ [n]}) with x /∈ B and y /∈ B,¹

d(x, y) ≡











3, if ((x = ˆp) ∧ (y = p)) or ((y = ˆp) ∧ (x = p));

1, if ((x = ˆp) ∧ (y 6= p)) or ((y = ˆp) ∧ (x 6= p));

4, if ((x = p) ∧ (y 6= ˆp)) or ((y = p) ∧ (x 6= ˆp));

2, otherwise.

(6)

The four cases in Eq. (6) are mutually exclusive because p 6= ˆp by Eq. (5). Clearly, Eq. (6) does not assign different values to d(x, y) and d(y, x).

The following lemma is straightforward from Eqs. (3)–(4) and (6).

Lemma 4. For all x, y ∈ [n], d(x, y) ∈ {0, 1, 2, 3, 4}.

Lemma 5.

X

y∈[n]

d (ˆp, y) ≤ (1 + 4) n + o(n).

Proof. By Lemmas 2 and 4, X

y∈B

d (ˆp, y) = o(n). (7)

Furthermore,

X

y∈[n] s.t. ( ˆp,y)∈Q

d (ˆp, y)

Lemma 4

≤ X

y∈[n] s.t. ( ˆp,y)∈Q

4

= 4 deg_G(ˆp)

< 4n, (8)

1These are precisely the pairs whose d-distances are not specified by Eqs. (3)–(4).

(3)

where the last inequality follows from Eq. (1) and ˆ

p /∈ B. We have X

y∈[n]\(B∪{p, ˆp}) s.t. ( ˆp,y) /∈Q

d (ˆp, y) ≤ n (9)

because all summands are 1 by Eq. (6) and ˆp /∈ B.

By Lemma 4, X

y∈{p, ˆp}

d (ˆp, y) = O(1). (10)

Summing up Eqs. (7)–(10) completes the proof.

Lemma 6.

X

y∈[n]

d (p, y) ≥ 4 (n − o(n) − n) .

Proof. Recall that p /∈ B. We have X

y∈[n]

d (p, y)

≥ X

y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q

d (p, y)

Eq. (6)

= X

y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q

4

≥ 4 (| {y ∈ [n] \ (B ∪ {p}) | (p, y) /∈ Q} | − 1)

Lemma 3

≥ 4 (n − | B | − n − 2)

Lemma 2

= 4 (n − o(n) − n) .

The next lemma is immediate from Eq. (4) and p⁰∈ B.

Lemma 7.

X

y∈[n]\{p⁰}

d (p⁰, y) = 4 (n − 1) .

We proceed to prove that ([n], d) is a metric space through a few lemmas.

Lemma 8. For all x, y ∈ [n], if d(x, y) = 1, then ˆ

p ∈ {x, y}.

Proof. Inspect Eq. (6), which is the only equation that may set distances to 1.

Below is a consequence of Lemma 8.

Lemma 9. For all distinct x, y, z ∈ [n], if d(x, y) = d(x, z) = 1, then x = ˆp.

The following lemma is immediate from Eqs. (4) and (6).

Lemma 10. For all x, y ∈ [n], if d(x, y) = 4, then {x, y} ∩ (B ∪ {p}) 6= ∅.

Below is a consequence of p /∈ B and Eqs. (5)–

(6).

Lemma 11. d(ˆp, p) = 3.

Lemma 12. ([n], d) is a metric space.

Proof. We only need to prove the triangle inequality for d because all the other axioms are easy to verify. Consider the following cases for all distinct x, y, z ∈ [n]:

• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 4. By Lemma 9, x = ˆp. Hence if y = p (resp., z = p), then d(x, y) = 3 (resp., d(x, z) = 3) by Lemma 11, a contradiction. Therefore, p /∈ {y, z}, which together with Lemma 10 forces {y, z} ∩ B 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contradiction.

• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 3.

By Lemma 9, x = ˆp. On the other hand, d(y, z) = 3 means (y, z) ∈ {(ˆp, p), (p, ˆp)} by Eq. (6) (which is the only equation that may set distances to 3), contradicting x = ˆp.

• d(x, y) = 1, d(x, z) = 2 and d(y, z) = 4.

By Lemma 10, {y, z} ∩ (B ∪ {p}) 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contradiction. Therefore, p ∈ {y, z}. Further- more, ˆp ∈ {x, y} by Lemma 8. Conse- quently, (p, ˆp) ∈ {(x, y), (x, z), (y, z)} (note that p 6= p by Eq. (5)), implying 3 ∈ˆ {d(x, y), d(x, z), d(y, z)} by Lemma 11, a contradiction.

We have excluded all possibilities of d(x, y) + d(x, z) < d(y, z), where x, y, z ∈ [n].

Combining Lemmas 5–7, 12 and that {p, p⁰} contains the output of A^d yields our main theorem.

Theorem 13. Metric 1-median has no deterministic nonadaptive o(n²)-query (4 − )- approximation algorithms for any constant > 0.

Theorem 13 shows that the approximation ra- tio of 4 in Theorem 1 cannot be improved to any constant c < 4.

(4)

Acknowledgment

The author is supported in part by the National Science Council of Taiwan under grant 101-2221- E-155-015-MY2.

References

[1] V. Arya, N. Garg, R. Khandekar, A. Mey- erson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility lo- cation problems. SIAM Journal on Comput- ing, 33(3):544–562, 2004.

[2] C.-L. Chang. Some results on approximate 1- median selection in metric spaces. Theoretical Computer Science, 426:1–12, 2012.

[3] C.-L. Chang. Deterministic sublinear-time approximations for metric 1-median selection.

Information Processing Letters, 113(8):288–

292, 2013.

[4] K. Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.

[5] D. Eppstein and J. Wang. Fast approximation of centrality. Journal of Graph Algo- rithms and Applications, 8(1):39–45, 2004.

[6] O. Goldreich and D. Ron. Approximating average parameters of graphs. Random Struc- tures & Algorithms, 32(4):473–493, 2008.

[7] S. Guha, A. Meyerson, N. Mishra, R. Mot- wani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Trans- actions on Knowledge and Data Engineering, 15(3):515–528, 2003.

[8] P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Com- puting, pages 428–434, 1999.

[9] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.

[10] R. Jaiswal, A. Kumar, and S. Sen. A simple D²-sampling based PTAS for k-means and other clustering problems. In Proceedings of the 18th Annual International Conference on Computing and Combinatorics, pages 13–24, 2012.

[11] A. Kumar, Y. Sabharwal, and S. Sen. Linear- time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.

[12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.

[13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cam- bridge University Press, 1994.