The other type of clustering we will implement this week is called Hierarchical or Agglomerative clustering.
Note that there are three ways of comparing two clusters, to determine if they should be merged:
Complete Linkage - Uses the furthest distance between a point in cluster A and a point in cluster B. This is the default behavior in hclust()
.
Single Linkage - Uses the closest distance between a point in cluster A and a point in cluster B.
Average Linkage - Uses the distance between the centroids of the clusters.
In the k-means coursework, you identified the authorship of the disputed Federalist Papers.
Try this out with hierarchical clustering instead.
Convert your fed
data into a matrix. (Do not reduce dimension with PCA.)
Use hclust()
on your data.
Create a dendrogram of the results, with the observations (“nodes” or “leaves”) labelled by author.
For extra fun, try out these ways to make prettier or more informative dendrograms:
Upload your dendrogram to Canvas.