Hierarchical Clustering
Last updated
Last updated
Distances based on the closest point in a cluster. What's the distance between Cluster 6-8 and point 7? In single link it's the dist between 6-7.
So now if you want just 2 clusters, you remove the top link so that you are left with 2 groups.
Distances:
However we can get more info: Dendrograms
On cluster #5, even if single link found 2 clusters, we can see with the dendrogram that it actually has info and extra insights: there are 3 different clusters.
You assume every point is a cluster and buildling out other clusters.
There is also the oposite - everything is one cluster, then you break down
Looks at the distance of the farthest two points and that's the distance between 2 clusters.
Farthest two points:
So we calculate the distances between the clusters we have, we choose the minimal distance between two clusters and group them as a new cluster.
This produces compact clusters - this is considered better than single link.
Looks at the distances between every point in the cluster and every other point in the other cluster. The avg of all the distances is the dist between the 2 clusters.
Default in scikit-learn: It minimizes variance between 2 clusters.
Orange X - average dist of all the points.