__init__(self,
*args,
**kwargs)
(Constructor)
| source code
|
Initialize the corpus reader. Categorization arguments
(cat_pattern, cat_map, and
cat_file) are passed to the CategorizedCorpusReader constructor. The remaining
arguments are passed to the PlaintextCorpusReader constructor.
- Parameters:
root - The root directory for this corpus.
fileids - A list or regexp specifying the fileids in this corpus.
word_tokenizer - Tokenizer for breaking sentences or paragraphs into words.
sent_tokenizer - Tokenizer for breaking paragraphs into words.
para_block_reader - The block reader used to divide the corpus into paragraph blocks.
- Overrides:
api.CorpusReader.__init__
- (inherited documentation)
|