Package nltk :: Module collocations :: Class AbstractCollocationFinder
[hide private]
[frames] | no frames]

type AbstractCollocationFinder

source code

object --+
         |
        AbstractCollocationFinder
Known Subclasses:

An abstract base class for collocation finders whose purpose is to collect collocation candidate frequencies, filter and rank them.

Instance Methods [hide private]
 
__init__(self, word_fd, ngram_fd)
As a minimum, collocation finders require the frequencies of each word in a corpus, and the joint frequency of word tuples.
source code
 
_apply_filter(self, fn=<function <lambda> at 0x1239f30>)
Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple.
source code
 
_score_ngrams(self, score_fn)
Generates of (ngram, score) pairs as determined by the scoring function provided.
source code
 
above_score(self, score_fn, min_score)
Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.
source code
 
apply_freq_filter(self, min_freq)
Removes candidate ngrams which have frequency less than min_freq.
source code
 
apply_ngram_filter(self, fn)
Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True.
source code
 
apply_word_filter(self, fn)
Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True.
source code
 
nbest(self, score_fn, n)
Returns the top n ngrams when scored by the given function.
source code
 
score_ngrams(self, score_fn)
Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.
source code
Class Methods [hide private]
 
from_documents(cls, documents)
Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.
source code
Static Methods [hide private]
 
_ngram_freqdist(words, n) source code
Method Details [hide private]

__init__(self, word_fd, ngram_fd)
(Constructor)

source code 

As a minimum, collocation finders require the frequencies of each word in a corpus, and the joint frequency of word tuples. This data should be provided through nltk.probability.FreqDist objects or an identical interface.

Overrides: object.__init__