Glossary

What is K-means Clustering

K-means Clustering is a popular unsupervised learning algorithm used to partition data points into K distinct clusters. Each cluster is defined by its centroid, which is the mean of the points assigned to that cluster. The algorithm iteratively assigns data points to the nearest centroid and recalculates the centroids until convergence.


The process begins with the random selection of K initial centroids. Each data point is then assigned to the cluster represented by the nearest centroid. After all points are assigned, the centroids are updated by calculating the mean of all points in each cluster. This process repeats until the centroids no longer change significantly or a maximum number of iterations is reached.


K-means is widely used in various fields such as market segmentation, social network analysis, and image processing. However, it has limitations, such as sensitivity to initial centroid placement and difficulty in handling non-spherical clusters. As data volumes increase, K-means may evolve by combining with other algorithms to form more robust clustering solutions.