Week 7: Hierarchical Clustering

Stat 431

Time Estimates:
     Videos: 10 min
     Readings: 0 min
     Activities: 40 min
     Check-ins: 1

Hierarchical Clustering

The other type of clustering we will implement this week is called Hierarchical or Agglomerative clustering.

Required Video: Intro to Hierarchical Clustering

Note that there are three ways of comparing two clusters, to determine if they should be merged:

  1. Complete Linkage - Uses the furthest distance between a point in cluster A and a point in cluster B. This is the default behavior in hclust().

  2. Single Linkage - Uses the closest distance between a point in cluster A and a point in cluster B.

  3. Average Linkage - Uses the distance between the centroids of the clusters.

Check-In 1: Case Study: Federalist Papers

In the k-means coursework, you identified the authorship of the disputed Federalist Papers.

Try this out with hierarchical clustering instead.

  1. Convert your fed data into a matrix. (Do not reduce dimension with PCA.)

  2. Use hclust() on your data.

  3. Create a dendrogram of the results, with the observations (“nodes” or “leaves”) labelled by author.

For extra fun, try out these ways to make prettier or more informative dendrograms:

Optional Reading: Fancy Dendrogram Plotting

Upload your dendrogram to Canvas.

Canvas Link