paperlined.org
dev > perl > modules > related_modules

document updated 10 years ago, on Feb 16, 2015

I want to do cluster analysis on text files. ~~(any time you can calculate some metric between elements in a set, you can perform clustering on that set)~~

use a spatial index — the DBSCAN article notes that this results in a O(n log n) runtime
- more generally, any technique that speeds up the nearest neighbor search
MinHash
use an algorithm that approximates the edit distance
use an algorithm that clusters suboptimally
locality-sensitive hashing

modules that do both metric+cluster

modules that just do string metrics

modules that just do clustering, and work with any metric algorithm

algorithms on Wikipedia

similar fields