# Hierarchical Clustering

### Single Link Clustering

![](/files/-Lmtyiy1siTGVRQ_ImJ7)

Distances based on the closest point in a cluster. What's the distance between Cluster 6-8 and point 7? In single link it's the dist between 6-7.&#x20;

So now if you want just 2 clusters, you remove the top link so that you are left with 2 groups.&#x20;

![](/files/-LmtzDl7FcTnRT5feFMO)

Distances:

![](/files/-LmtzVdzSyfcSYzSkr3k)

### KMeans vs Single link comparison

![](/files/-Lmu-TfR0B7_Y6RtXfmo)

However we can get more info: Dendrograms

![](/files/-Lmu0386g8g08ethzcJE)

On cluster #5, even if single link found 2 clusters, we can see with the dendrogram that it actually has info and extra insights: there are 3 different clusters.&#x20;

### Agglomerative Clustering:

You assume every point is a cluster and buildling out other clusters.&#x20;

There is also the oposite - everything is one cluster, then you break down&#x20;

### Complete Link clustering

Looks at the distance of the farthest two points and that's the distance between 2 clusters.&#x20;

![](/files/-Lmu1LJCtMm3Xg78F7Ei)

Farthest two points:

![](/files/-Lmu1bMc8hIJLfV3pSOS)

So we calculate the distances between the clusters we have, we choose the minimal distance between two clusters and group them as a new cluster.&#x20;

This produces compact clusters - this is considered better than single link.&#x20;

### Average-Link clustering

Looks at the distances between every point in the cluster and every other point in the other cluster. The avg of all the distances is the dist between the 2 clusters.&#x20;

### Ward's method

Default in scikit-learn: It minimizes variance between 2 clusters.&#x20;

![](/files/-Lmu2qRkGiY49sh7nvyA)

Orange X - average dist of all the points.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://julienbeaulieu.gitbook.io/wiki/sciences/machine-learning/unsupervised-learning/hierarchical-clustering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
