I was using this to try to cluster text files with an average size of 10kb, to find text files that are closely-related. Doing this with 10 files took ~1 minute. Trying this with 100 files, I gave up. I didn't realize clustering consumed so much CPU.

Some ways to explore making this process faster: (caveat: I don't know much about clustering)