Week 7: Hierarchical Clustering

Time Estimates:

Videos: 10 min

Readings: 0 min

Activities: 40 min

Check-ins: 1

Hierarchical Clustering

The other type of clustering we will implement this week is called Hierarchical or Agglomerative clustering.

Required Video: Intro to Hierarchical Clustering

Note that there are three ways of comparing two clusters, to determine if they should be merged:

Complete Linkage - Uses the furthest distance between a point in cluster A and a point in cluster B. This is the default behavior in hclust().
Single Linkage - Uses the closest distance between a point in cluster A and a point in cluster B.
Average Linkage - Uses the distance between the centroids of the clusters.

Check-In 1: Case Study: Federalist Papers

In the k-means coursework, you identified the authorship of the disputed Federalist Papers.

Try this out with hierarchical clustering instead.

Convert your fed data into a matrix. (Do not reduce dimension with PCA.)
Use hclust() on your data.
Create a dendrogram of the results, with the observations (“nodes” or “leaves”) labelled by author.

For extra fun, try out these ways to make prettier or more informative dendrograms:

Optional Reading: Fancy Dendrogram Plotting

Upload your dendrogram to Canvas.