Package nltk :: Module collocations
[hide private]
[frames] | no frames]

Module collocations

source code

Tools to identify collocations --- words that often appear consecutively --- within corpora. They may also be used to find other associations between word occurrences. See Manning and Schutze ch. 5 at http://nlp.stanford.edu/fsnlp/promo/colloc.pdf and the Text::NSP Perl package at http://ngram.sourceforge.net

Finding collocations requires first calculating the frequencies of words and their appearance in the context of other words. Often the collection of words will then requiring filtering to only retain useful content terms. Each ngram of words may then be scored according to some association measure, in order to determine the relative likelihood of each ngram being a collocation.

The BigramCollocationFinder and TrigramCollocationFinder classes provide these functionalities, dependent on being provided a function which scores a ngram given appropriate frequency counts. A number of standard association measures are provided in bigram_measures and trigram_measures.

Classes [hide private]
AbstractCollocationFinder
An abstract base class for collocation finders whose purpose is to collect collocation candidate frequencies, filter and rank them.
BigramCollocationFinder
A tool for the finding and ranking of bigram collocations or other association measures.
TrigramCollocationFinder
A tool for the finding and ranking of bigram collocations or other association measures.
Functions [hide private]
 
demo(scorer=None, compare_scorer=None)
Finds trigram collocations in the files of the WebText corpus.
source code