Hierarchical Clustering

Single Link Clustering

Distances based on the closest point in a cluster. What's the distance between Cluster 6-8 and point 7? In single link it's the dist between 6-7.

So now if you want just 2 clusters, you remove the top link so that you are left with 2 groups.

Distances:

KMeans vs Single link comparison

However we can get more info: Dendrograms

On cluster #5, even if single link found 2 clusters, we can see with the dendrogram that it actually has info and extra insights: there are 3 different clusters.

Agglomerative Clustering:

You assume every point is a cluster and buildling out other clusters.

There is also the oposite - everything is one cluster, then you break down

Complete Link clustering

Looks at the distance of the farthest two points and that's the distance between 2 clusters.

Farthest two points:

So we calculate the distances between the clusters we have, we choose the minimal distance between two clusters and group them as a new cluster.

This produces compact clusters - this is considered better than single link.

Average-Link clustering

Looks at the distances between every point in the cluster and every other point in the other cluster. The avg of all the distances is the dist between the 2 clusters.

Ward's method

Default in scikit-learn: It minimizes variance between 2 clusters.

Orange X - average dist of all the points.

PreviousK-Means NextDBSCAN

Last updated 6 years ago

Was this helpful?