Metric k-center

From HandWiki
Short description: Combinatorial optimization problem


In graph theory, the metric k-center problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

Let [math]\displaystyle{ (X,d) }[/math] be a metric space where [math]\displaystyle{ X }[/math] is a set and [math]\displaystyle{ d }[/math] is a metric
A set [math]\displaystyle{ \mathbf{V}\subseteq\mathcal{X} }[/math], is provided together with a parameter [math]\displaystyle{ k }[/math]. The goal is to find a subset [math]\displaystyle{ \mathcal{C}\subseteq \mathbf{V} }[/math] with [math]\displaystyle{ |\mathcal{C}|=k }[/math] such that the maximum distance of a point in [math]\displaystyle{ \mathbf{V} }[/math] to the closest point in [math]\displaystyle{ \mathcal{C} }[/math] is minimized. The problem can be formally defined as follows:
For a metric space ([math]\displaystyle{ \mathcal{X} }[/math],d),

  • Input: a set [math]\displaystyle{ \mathbf{V}\subseteq\mathcal{X} }[/math], and a parameter [math]\displaystyle{ k }[/math].
  • Output: a set [math]\displaystyle{ \mathcal{C}\subseteq\mathbf{V} }[/math] of [math]\displaystyle{ k }[/math] points.
  • Goal: Minimize the cost [math]\displaystyle{ r^\mathcal{C}(\mathbf{V}) = \underset{v\in V}{\max} }[/math] d(v,[math]\displaystyle{ \mathcal{C} }[/math])

That is, Every point in a cluster is in distance at most [math]\displaystyle{ r^\mathcal{C}(V) }[/math] from its respective center. [1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (VE) as follows:
Given a complete undirected graph G = (VE) with distances d(vivj) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

[math]\displaystyle{ \max_{v \in V} \min_{c \in C} d(v,c) }[/math]

Computational complexity

In a complete undirected graph G = (VE), if we sort the edges in non-decreasing order of the distances: d(e1) ≤ d(e2) ≤ ... ≤ d(em) and let Gi = (V, Ei), where Ei = {e1e2, ..., ei}. The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k. [2]

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that Gi has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds [math]\displaystyle{ \mathcal{C} }[/math] using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

  • Pick an arbitrary point [math]\displaystyle{ \bar{c}_1 }[/math] into [math]\displaystyle{ C_1 }[/math]
  • For every point [math]\displaystyle{ v\in \mathbf{V} }[/math] compute [math]\displaystyle{ d_1[v] }[/math] from [math]\displaystyle{ \bar{c}_1 }[/math]
  • Pick the point [math]\displaystyle{ \bar{c}_2 }[/math] with highest distance from [math]\displaystyle{ \bar{c}_1 }[/math].
  • Add it to the set of centers and denote this expanded set of centers as [math]\displaystyle{ C_2 }[/math]. Continue this till k centers are found

Running time

  • The ith iteration of choosing the ith center takes [math]\displaystyle{ \mathcal{O}(n) }[/math] time.
  • There are k such iterations.
  • Thus, overall the algorithm takes [math]\displaystyle{ \mathcal{O}(nk) }[/math] time.[3]

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points [math]\displaystyle{ \mathbf{V}\subseteq\mathcal{X} }[/math], belonging to a metric space ([math]\displaystyle{ \mathcal{X} }[/math],d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. [math]\displaystyle{ r^{\mathbf{K}}(\mathbf{V})\leq 2r^{opt}(\mathbf{V},\textit{k}) }[/math] [1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of [math]\displaystyle{ \mathcal{C}_{opt} }[/math] contains exactly one point of [math]\displaystyle{ \mathbf{K} }[/math]

  • Consider a point [math]\displaystyle{ v\in \mathbf{V} }[/math]
  • Let [math]\displaystyle{ \bar{c} }[/math] be the center it belongs to in [math]\displaystyle{ \mathcal{C}_{opt} }[/math]
  • Let [math]\displaystyle{ \bar{k} }[/math] be the center of [math]\displaystyle{ \mathbf{K} }[/math] that is in [math]\displaystyle{ \Pi(\mathcal{C}_{opt},\bar{c}) }[/math]
  • [math]\displaystyle{ d(v,\bar{c})=d(v,\mathcal{C}_{opt})\leq r^{opt}(\mathbf{V},k) }[/math]
  • Similarly, [math]\displaystyle{ d(\bar{k},\bar{c})=d(\bar{k},\mathcal{C}_{opt})\leq r^{opt} }[/math]
  • By the triangle inequality: [math]\displaystyle{ d(v,\bar{k})\leq d(v,\bar{c})+d(\bar{c},\bar{k})\leq 2r^{opt} }[/math]


Case 2: There are two centers [math]\displaystyle{ \bar{k} }[/math] and [math]\displaystyle{ \bar{u} }[/math] of [math]\displaystyle{ \mathbf{K} }[/math] that are both in [math]\displaystyle{ \Pi(\mathcal{C}_{opt},\bar{c}) }[/math], for some [math]\displaystyle{ \bar{c}\in \mathcal{C}_{opt} }[/math] (By pigeon hole principle, this is the only other possibility)

  • Assume, without loss of generality, that [math]\displaystyle{ \bar{u} }[/math] was added later to the center set [math]\displaystyle{ \mathbf{K} }[/math] by the greedy algorithm, say in ith iteration.
  • But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that [math]\displaystyle{ \bar{k}\in\mathcal{C}_{i-1} }[/math]and,

[math]\displaystyle{ \begin{align} r^\mathbf{K}(\mathbf{V})\leq r^{\mathcal{C}_{i-1}}(\mathbf{V})&=d(\bar{u},\mathcal{C}_{i-1})\\ &\leq d(\bar{u},\bar{k})\\ &\leq d(\bar{u},\bar{c})+d(\bar{c},\bar{k})\\ &\leq 2r^{opt} \end{align} }[/math] [1]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k. [4] It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. [5] Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP. [6]

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 − ε for any ε > 0, when using k as the parameter.[7] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.[8] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.[9] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.[10]

See also

References

  1. 1.0 1.1 1.2 Geometric Approximation Algorithms. Boston, MA, USA: American Mathematical Society. 2011. ISBN 978-0821849118. 
  2. Vazirani, Vijay V. (2003), Approximation Algorithms, Berlin: Springer, pp. 47–48, ISBN 3-540-65367-8 
  3. Gonzalez (1985), "Clustering to minimize the maximum intercluster distance", Theoretical Computer Science, 38, Elsevier Science B.V., pp. 293–306, doi:10.1016/0304-3975(85)90224-5 
  4. Hochbaum; Shmoys (1986), "A unified approach to approximation algorithms for bottleneck problems", Journal of the ACM, 33, pp. 533–550, doi:10.1145/5925.5933, ISSN 0004-5411 
  5. Hochbaum, Dorit S. (1997), Approximation Algorithms for NP-Hard problems, Boston: PWS Publishing Company, pp. 346–398, ISBN 0-534-94968-1 
  6. Crescenzi, Pierluigi; Kann, Viggo; Halldórsson, Magnús (2000), "Minimum k-center", A Compendium of NP Optimization Problems, http://www.csc.kth.se/~viggo/wwwcompendium/node128.html 
  7. Feldmann, Andreas Emil (2019-03-01). "Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs" (in en). Algorithmica 81 (3): 1031–1052. doi:10.1007/s00453-018-0455-0. ISSN 1432-0541. https://eprints.whiterose.ac.uk/200961/1/1605.02530.pdf. 
  8. Feder, Tomás; Greene, Daniel (1988-01-01). "Optimal algorithms for approximate clustering". Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88. New York, NY, USA: Association for Computing Machinery. pp. 434–444. doi:10.1145/62212.62255. ISBN 978-0-89791-264-8. https://doi.org/10.1145/62212.62255. 
  9. Feldmann, Andreas Emil; Marx, Dániel (2020-07-01). "The Parameterized Hardness of the k-Center Problem in Transportation Networks" (in en). Algorithmica 82 (7): 1989–2005. doi:10.1007/s00453-020-00683-w. ISSN 1432-0541. https://drops.dagstuhl.de/opus/volltexte/2018/8845/pdf/LIPIcs-SWAT-2018-19.pdf. 
  10. Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". in Bekos, Michael A.; Kaufmann, Michael (in en). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. 13453. Cham: Springer International Publishing. pp. 215–229. doi:10.1007/978-3-031-15914-5_16. ISBN 978-3-031-15914-5. https://link.springer.com/chapter/10.1007/978-3-031-15914-5_16. 

Further reading