A lower bound for 1-median selection in metric spaces
Ching-Lueh Chang
Department of Computer Science and Engineering Yuan Ze University, Taoyuan, Taiwan
clchang@saturn.yzu.edu.tw
Abstract
Consider the problem of finding a point in an n-point metric space with the minimum average distance to all points. We show that this problem has no deterministic nonadaptive o(n2)-query (4−
Ω(1))-approximation algorithms.
1 Introduction
Given oracle access to a metric space ({1, 2, . . . , n}, d), the metric 1-median problem asks for a point with the minimum average dis- tance to all points. Indyk [8, 9] shows that metric 1-median has a Monte-Carlo O(n/2)-time (1+)- approximation algorithm with an Ω(1) probability of success. The more general metric k-median problem asks for x1, x2, . . ., xk ∈ {1, 2, . . . , n}
minimizing P
x∈{1,2,...,n} minki=1 d(xi, x). Ran- domized as well as evasive algorithms are well- studied for metric k-median and the related k- means problem [1, 4, 7, 10–12], where k ≥ 1 is part of the input rather than a constant.
This paper focuses on deterministic sublinear- query algorithms for metric 1-median. Guha et al. [7, Sec. 3.1–3.2] prove that metric k- median has a deterministic O(n1+)-time O(n)- space 2O(1/)-approximation algorithm that reads distances in a single pass, where > 0. Chang [3]
presents a deterministic nonadaptive O(n1.5)-time 4-approximation algorithm for metric 1-median.
He also shows that metric 1-median has no de- terministic o(n2)-query (3 − )-approximation al- gorithms for any constant > 0 [2]. Modifying a proof of Chang [2, Sec. 5], this paper shows that metric 1-median has no deterministic nonadap- tive o(n2)-query (4 − )-approximation algorithms for any constant > 0.
In social network analysis, the importance of an actor in a network may be quantified by sev- eral centrality measures, among which the close- ness centrality of an actor is defined to be its av-
erage distance to other actors [13]. So metric 1- median can be interpreted as the problem of find- ing the most important point in a metric space.
Goldreich and Ron [6] and Eppstein and Wang [5]
present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.
2 Definitions
For n ∈ Z+, denote [n] ≡ {1, 2, . . . , n}. An n- point metric space ([n], d) is the set [n] endowed with a function d : [n] × [n] → R satisfying
1. d(x, y) ≥ 0 (non-negativeness),
2. d(x, y) = 0 if and only if x = y (identity of indiscernibles),
3. d(x, y) = d(y, x) (symmetry), and
4. d(x, y)+d(x, z) ≥ d(y, z) (triangle inequality) for all x, y, z ∈ [n]. An equivalent definition re- quires the triangle inequality only for distinct x, y, z ∈ [n], axioms 1–3 remaining.
An algorithm with oracle access to a metric space ([n], d) is given n and may query d on any (x, y) ∈ [n] × [n] to obtain d(x, y). Without loss of generality, we forbid queries for d(x, x), which triv- ially returns 0, as well as repeated queries, where queries for d(x, y) and d(y, x) are seen as identi- cal. An algorithm is said to be nonadaptive if its (k + 1)-th query (which is a pair in [n] × [n]) is independent of the answers to the first k queries, where k ∈ N. Consequently, a deterministic non- adaptive algorithm’s set of queries depends only on n and the algorithm itself but not on d. For convenience, denote an algorithm ALG with ora- cle access to ([n], d) by ALGd.
Given oracle access to a finite metric space ([n], d), the metric 1-median problem asks for a point in [n] with the minimum average distance
to all points. An algorithm for this problem is α- approximate if it outputs a point x ∈ [n] satisfying
X
y∈[n]
d (x, y) ≤ α min
x0∈[n]
X
y∈[n]
d (x0, y) ,
where α ≥ 1.
The following theorem is due to Chang [3].
Theorem 1 ([3]). Metric 1-median has a deterministic nonadaptive O(n1.5)-time 4- approximation algorithm.
3 Lower bound
Let A be any deterministic nonadaptive o(n2)- query algorithm for metric 1-median, ∈ (0, 0.1) be a constant and d : [n] × [n] → R be a metric to be determined later.
Define
Q ≡n
unordered pair (x, y) ∈ [n]2| Adever queries for d (x, y)o
to be the set of queries of Adtreated as unordered pairs. Without loss of generality, assume (x, x) /∈ Q for all x ∈ [n]. Let G = ([n], Q) be the simple undirected graph with vertex set [n] and edge set Q. Denote the degree of x ∈ [n] in G by degG(x) =
|{y ∈ [n] | (x, y) ∈ Q}|, and
B ≡ {x ∈ [n] | degG(x) ≥ n} . (1) In the sequel, we will specify d incrementally in several steps. Note that Q and B are independent of d because of the nonadaptivity of A; hence they will remain intact during our specification of d.
Below is an easy lemma.
Lemma 2 (Implicit in [2]). | B | = o(n).
Henceforth we will assume n ∈ Z+ to be suffi- ciently large so that
n − | B | − 1 − n > 0 (2) by Lemma 2. For all x ∈ [n],
d (x, x) ≡ 0. (3)
For all (x, y) ∈ [n]2\ {(x, x) | x ∈ [n]} with x ∈ B, y ∈ B or (x, y) ∈ Q,
d(x, y) ≡
4, if x ∈ B or y ∈ B;
2, otherwise. (4)
Clearly, this does not assign different values to d(x, y) and d(y, x).
As Eq. (4) specifies d on a superset of Q (which is the set of A’s queries) and A is deterministic, the output of Ad has now been fixed even though d is not fully specified yet. Let p ∈ [n] \ B and p0∈ B be such that {p, p0} contains the output of Ad.
Lemma 3.
| ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (p, x) /∈ Q} | ≥ n−| B ∪ {p} |−n.
Proof. Eq. (1) and p /∈ B imply degG(p) < n, i.e.,
| {x ∈ [n] | (p, x) ∈ Q} | < n.
Take ˆ
p ∈ ([n] \ (B ∪ {p})) ∩ {x ∈ [n] | (x, p) /∈ Q} (5) arbitrarily, as can be done by Lemma 3 and Eq. (2). Trivially, ˆp /∈ B.
We now complete the specification of d. For all (x, y) ∈ [n]2\ (Q ∪ {(x, x) | x ∈ [n]}) with x /∈ B and y /∈ B,1
d(x, y) ≡
3, if ((x = ˆp) ∧ (y = p)) or ((y = ˆp) ∧ (x = p));
1, if ((x = ˆp) ∧ (y 6= p)) or ((y = ˆp) ∧ (x 6= p));
4, if ((x = p) ∧ (y 6= ˆp)) or ((y = p) ∧ (x 6= ˆp));
2, otherwise.
(6)
The four cases in Eq. (6) are mutually exclusive because p 6= ˆp by Eq. (5). Clearly, Eq. (6) does not assign different values to d(x, y) and d(y, x).
The following lemma is straightforward from Eqs. (3)–(4) and (6).
Lemma 4. For all x, y ∈ [n], d(x, y) ∈ {0, 1, 2, 3, 4}.
Lemma 5.
X
y∈[n]
d (ˆp, y) ≤ (1 + 4) n + o(n).
Proof. By Lemmas 2 and 4, X
y∈B
d (ˆp, y) = o(n). (7)
Furthermore,
X
y∈[n] s.t. ( ˆp,y)∈Q
d (ˆp, y)
Lemma 4
≤ X
y∈[n] s.t. ( ˆp,y)∈Q
4
= 4 degG(ˆp)
< 4n, (8)
1These are precisely the pairs whose d-distances are not specified by Eqs. (3)–(4).
where the last inequality follows from Eq. (1) and ˆ
p /∈ B. We have X
y∈[n]\(B∪{p, ˆp}) s.t. ( ˆp,y) /∈Q
d (ˆp, y) ≤ n (9)
because all summands are 1 by Eq. (6) and ˆp /∈ B.
By Lemma 4, X
y∈{p, ˆp}
d (ˆp, y) = O(1). (10)
Summing up Eqs. (7)–(10) completes the proof.
Lemma 6.
X
y∈[n]
d (p, y) ≥ 4 (n − o(n) − n) .
Proof. Recall that p /∈ B. We have X
y∈[n]
d (p, y)
≥ X
y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q
d (p, y)
Eq. (6)
= X
y∈[n]\(B∪{p, ˆp}) s.t. (p,y) /∈Q
4
≥ 4 (| {y ∈ [n] \ (B ∪ {p}) | (p, y) /∈ Q} | − 1)
Lemma 3
≥ 4 (n − | B | − n − 2)
Lemma 2
= 4 (n − o(n) − n) .
The next lemma is immediate from Eq. (4) and p0∈ B.
Lemma 7.
X
y∈[n]\{p0}
d (p0, y) = 4 (n − 1) .
We proceed to prove that ([n], d) is a metric space through a few lemmas.
Lemma 8. For all x, y ∈ [n], if d(x, y) = 1, then ˆ
p ∈ {x, y}.
Proof. Inspect Eq. (6), which is the only equation that may set distances to 1.
Below is a consequence of Lemma 8.
Lemma 9. For all distinct x, y, z ∈ [n], if d(x, y) = d(x, z) = 1, then x = ˆp.
The following lemma is immediate from Eqs. (4) and (6).
Lemma 10. For all x, y ∈ [n], if d(x, y) = 4, then {x, y} ∩ (B ∪ {p}) 6= ∅.
Below is a consequence of p /∈ B and Eqs. (5)–
(6).
Lemma 11. d(ˆp, p) = 3.
Lemma 12. ([n], d) is a metric space.
Proof. We only need to prove the triangle inequal- ity for d because all the other axioms are easy to verify. Consider the following cases for all distinct x, y, z ∈ [n]:
• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 4. By Lemma 9, x = ˆp. Hence if y = p (resp., z = p), then d(x, y) = 3 (resp., d(x, z) = 3) by Lemma 11, a contradiction. Therefore, p /∈ {y, z}, which together with Lemma 10 forces {y, z} ∩ B 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contradiction.
• d(x, y) = 1, d(x, z) = 1 and d(y, z) = 3.
By Lemma 9, x = ˆp. On the other hand, d(y, z) = 3 means (y, z) ∈ {(ˆp, p), (p, ˆp)} by Eq. (6) (which is the only equation that may set distances to 3), contradicting x = ˆp.
• d(x, y) = 1, d(x, z) = 2 and d(y, z) = 4.
By Lemma 10, {y, z} ∩ (B ∪ {p}) 6= ∅. But if y ∈ B (resp., z ∈ B), then d(x, y) = 4 (resp., d(x, z) = 4) by Eq. (4), a contra- diction. Therefore, p ∈ {y, z}. Further- more, ˆp ∈ {x, y} by Lemma 8. Conse- quently, (p, ˆp) ∈ {(x, y), (x, z), (y, z)} (note that p 6= p by Eq. (5)), implying 3 ∈ˆ {d(x, y), d(x, z), d(y, z)} by Lemma 11, a con- tradiction.
We have excluded all possibilities of d(x, y) + d(x, z) < d(y, z), where x, y, z ∈ [n].
Combining Lemmas 5–7, 12 and that {p, p0} contains the output of Ad yields our main theo- rem.
Theorem 13. Metric 1-median has no de- terministic nonadaptive o(n2)-query (4 − )- approximation algorithms for any constant > 0.
Theorem 13 shows that the approximation ra- tio of 4 in Theorem 1 cannot be improved to any constant c < 4.
Acknowledgment
The author is supported in part by the National Science Council of Taiwan under grant 101-2221- E-155-015-MY2.
References
[1] V. Arya, N. Garg, R. Khandekar, A. Mey- erson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility lo- cation problems. SIAM Journal on Comput- ing, 33(3):544–562, 2004.
[2] C.-L. Chang. Some results on approximate 1- median selection in metric spaces. Theoretical Computer Science, 426:1–12, 2012.
[3] C.-L. Chang. Deterministic sublinear-time approximations for metric 1-median selection.
Information Processing Letters, 113(8):288–
292, 2013.
[4] K. Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.
[5] D. Eppstein and J. Wang. Fast approxima- tion of centrality. Journal of Graph Algo- rithms and Applications, 8(1):39–45, 2004.
[6] O. Goldreich and D. Ron. Approximating av- erage parameters of graphs. Random Struc- tures & Algorithms, 32(4):473–493, 2008.
[7] S. Guha, A. Meyerson, N. Mishra, R. Mot- wani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Trans- actions on Knowledge and Data Engineering, 15(3):515–528, 2003.
[8] P. Indyk. Sublinear time algorithms for met- ric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Com- puting, pages 428–434, 1999.
[9] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.
[10] R. Jaiswal, A. Kumar, and S. Sen. A simple D2-sampling based PTAS for k-means and other clustering problems. In Proceedings of the 18th Annual International Conference on Computing and Combinatorics, pages 13–24, 2012.
[11] A. Kumar, Y. Sabharwal, and S. Sen. Linear- time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.
[12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.
[13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cam- bridge University Press, 1994.