Package nltk :: Module text :: Class TextCollection
[hide private]
[frames] | no frames]

type TextCollection

source code

object --+    
         |    
      Text --+
             |
            TextCollection

A collection of texts, which can be loaded with list of texts, or with a corpus consisting of one or more texts, and which supports counting, concordancing, collocation discovery, etc. Initialize a TextCollection as follows:

>>> gutenberg = TextCollection(nltk.corpus.gutenberg)
>>> mytexts = TextCollection([text1, text2, text3])

Iterating over a TextCollection produces all the tokens of all the texts in order.

Instance Methods [hide private]
 
__init__(self, source, name=None)
Create a Text object.
source code
 
tf(self, term, text, method=None)
The frequency of the term in text.
source code
 
idf(self, term, method=None)
The number of texts in the corpus divided by the number of texts that the term appears in.
source code
 
tf_idf(self, term, text) source code

Inherited from Text: __getitem__, __len__, __repr__, collocations, common_contexts, concordance, count, dispersion_plot, findall, generate, index, plot, readability, search, similar, vocab

Inherited from Text (private): _context

Class Variables [hide private]

Inherited from Text (private): _CONTEXT_RE, _COPY_TOKENS

Method Details [hide private]

__init__(self, source, name=None)
(Constructor)

source code 

Create a Text object.

Parameters:
  • tokens - The source text.
Overrides: Text.__init__
(inherited documentation)

idf(self, term, method=None)

source code 

The number of texts in the corpus divided by the number of texts that the term appears in. If a term does not appear in the corpus, 0.0 is returned.