Improved smoothed analysis of the $k$-means method


Manthey, Bodo and Röglin, Heiko (2009) Improved smoothed analysis of the $k$-means method. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, 4-6 Jan 2009, New York, NY, USA (pp. pp. 461-470).

open access
Abstract:The $k$-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii aimed at closing this gap, and they proved a bound of poly($n^k, 1/\sigma$) on the smoothed runningtime of the $k$-means method, where $n$ is the number of data points and $\sigma$ is the standard deviation of the Gaussian perturbation. This bound, though better than the worstcase bound, is still much larger than the running-time observed in practice.
We improve the smoothed analysis of the $k$-means method by showing two upper bounds on the expected running-time of $k$-means. First, we prove that the expected running-time is bounded by a polynomial in $n^{\sqrt{k}}$ and $1/\sigma.$ Second, we prove an upper bound of $k^{kd}$ poly($n, 1/\sigma$), where $d$ is the dimension of the data space. The polynomial is independent of $k$ and $d$, and we obtain a polynomial bound for the expected running-time for $k, d \in O(\sqrt{\log n/ \log\log n}).$
Finally, we show that $k$-means runs in smoothed polynomial time for one-dimensional instances.
Item Type:Conference or Workshop Item
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:
Official URL:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page