Package nltk :: Package cluster :: Module gaac :: Class GAAClusterer
[hide private]
[frames] | no frames]

type GAAClusterer

source code

           object --+        
                    |        
         api.ClusterI --+    
                        |    
util.VectorSpaceClusterer --+
                            |
                           GAAClusterer

The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c.

This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process.

Instance Methods [hide private]
 
__init__(self, num_clusters=1, normalise=True, svd_dimensions=None) source code
 
cluster(self, vectors, assign_clusters=False, trace=False)
Assigns the vectors to clusters, learning the clustering parameters from the data.
source code
 
cluster_vectorspace(self, vectors, trace=False)
Finds the clusters using the given set of vectors.
source code
 
update_clusters(self, num_clusters) source code
 
classify_vectorspace(self, vector)
Returns the index of the appropriate cluster for the vector.
source code
Dendrogram
dendrogram(self)
Returns: The dendrogram representing the current clustering
source code
 
num_clusters(self)
Returns the number of clusters.
source code
 
_average_similarity(self, v1, l1, v2, l2) source code
 
__repr__(self) source code

Inherited from util.VectorSpaceClusterer: classify, likelihood, likelihood_vectorspace, vector

Inherited from util.VectorSpaceClusterer (private): _normalise

Inherited from api.ClusterI: classification_probdist, cluster_name, cluster_names

Method Details [hide private]

__init__(self, num_clusters=1, normalise=True, svd_dimensions=None)
(Constructor)

source code 
Parameters:
  • normalise - should vectors be normalised to length 1
  • svd_dimensions - number of dimensions to use in reducing vector dimensionsionality with SVD
Overrides: util.VectorSpaceClusterer.__init__
(inherited documentation)

cluster(self, vectors, assign_clusters=False, trace=False)

source code 

Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.

Overrides: api.ClusterI.cluster
(inherited documentation)

cluster_vectorspace(self, vectors, trace=False)

source code 

Finds the clusters using the given set of vectors.

Overrides: util.VectorSpaceClusterer.cluster_vectorspace
(inherited documentation)

classify_vectorspace(self, vector)

source code 

Returns the index of the appropriate cluster for the vector.

Overrides: util.VectorSpaceClusterer.classify_vectorspace
(inherited documentation)

dendrogram(self)

source code 
Returns: Dendrogram
The dendrogram representing the current clustering

num_clusters(self)

source code 

Returns the number of clusters.

Overrides: api.ClusterI.num_clusters
(inherited documentation)

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)