| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
Text
A wrapper around a sequence of simple (string) tokens, which is
intended to support initial exploration of texts (via the interactive
console). Its methods perform a variety of analyses on the text's
contexts (e.g., counting, concordancing, collocation discovery), and
display the results. If you wish to write a program which makes use of
these analyses, then you should bypass the Text class, and
use the appropriate analysis function or class directly instead.
Texts are typically initialized from a given document or
corpus. E.g.:
>>> moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
| string |
|
||
|
|||
_COPY_TOKENS = True
|
|||
_CONTEXT_RE = re.compile(r'\w
|
|||
|
|||
Create a Text object.
|
Print a concordance for See Also: ConcordanceIndex |
Print collocations derived from the text, ignoring stopwords.
See Also:
|
Print random text, generated using a trigram language model.
See Also: NgramModel |
Search for instances of the regular expression pattern in the text. See Also: TokenSearcher |
Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.
See Also: ContextIndex.similar_words() |
Find contexts where the specified words appear; list most frequent common contexts first.
See Also: ContextIndex.common_contexts() |
Produce a plot showing the distribution of the words through the text. Requires pylab to be installed.
See Also: nltk.draw.dispersion_plot() |
See documentation for FreqDist.plot() See Also:
|
See Also: nltk.prob.FreqDist |
Find instances of the regular expression in the text. The text is a list of tokens, and a regexp pattern to match a single token must be surrounded by angle brackets. E.g. >>> text5.findall("<.*><.*><bro>") you rule bro; telling you bro; u twizted bro >>> text1.findall("<a>(<.*>)<man>") monied; nervous; dangerous; white; white; white; pious; queer; good; mature; white; Cape; great; wise; wise; butterless; white; fiendish; pale; furious; better; certain; complete; dismasted; younger; brave; brave; brave; brave >>> text9.findall("<th.*>{3,}") thread through those; the thought that; that the thing; the thing that; that that thing; through these than through; them that the; through the thick; them that they; thought that the
|
One left & one right token, both case-normalied. Skip over non-sentence-final punctuation. Used by the ContextIndex that is created for similar() and common_contexts(). |
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:53 2011 | http://epydoc.sourceforge.net |