Skip to content

K-means clustering

  • \(K\)-means clustering has two parts \(K\) and mean.
  • The goal is to split the data into \(K\) clusters which are homogenous within the cluster and heterogenous between the clusters.
  • The method to do so involves using the mean (average) of clusters formed in each iteration, hence the word "mean" in the name.

Method

  1. Select \(K\) random points from the dataset. These are tentative Centroids of each of the clusters we are going to form.
  2. For every point in the dataset, calculate the distance from each of the \(K\) centroids. Assign this point to the centroid with the smallest distance.
  3. The associated point with each centroid form a cluster! Find the mean (average) of all the points in the cluster, and set it as the new centroid of the cluster. (You get \(K\) new centroids this way)
  4. Repeat step (2-3) for each of the points but for the new centroids.
  5. Repeat this process till you converge to a particular cluster.