Package nltk :: Package corpus :: Package reader :: Module plaintext :: Class PortugueseCategorizedPlaintextCorpusReader
[hide private]
[frames] | no frames]

type PortugueseCategorizedPlaintextCorpusReader

source code

                  object --+        
                           |        
 api.CategorizedCorpusReader --+    
                               |    
              object --+       |    
                       |       |    
        api.CorpusReader --+   |    
                           |   |    
       PlaintextCorpusReader --+    
                               |    
CategorizedPlaintextCorpusReader --+
                                   |
                                  PortugueseCategorizedPlaintextCorpusReader

Nested Classes [hide private]

Inherited from PlaintextCorpusReader: CorpusView

Instance Methods [hide private]
 
__init__(self, *args, **kwargs)
Initialize the corpus reader.
source code

Inherited from CategorizedPlaintextCorpusReader: paras, raw, sents, words

Inherited from CategorizedPlaintextCorpusReader (private): _resolve

Inherited from api.CategorizedCorpusReader: categories, fileids

Inherited from api.CategorizedCorpusReader (private): _add, _init

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open, readme

Inherited from api.CorpusReader (private): _get_root

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

__init__(self, *args, **kwargs)
(Constructor)

source code 

Initialize the corpus reader. Categorization arguments (cat_pattern, cat_map, and cat_file) are passed to the CategorizedCorpusReader constructor. The remaining arguments are passed to the PlaintextCorpusReader constructor.

Parameters:
  • root - The root directory for this corpus.
  • fileids - A list or regexp specifying the fileids in this corpus.
  • word_tokenizer - Tokenizer for breaking sentences or paragraphs into words.
  • sent_tokenizer - Tokenizer for breaking paragraphs into words.
  • para_block_reader - The block reader used to divide the corpus into paragraph blocks.
Overrides: api.CorpusReader.__init__
(inherited documentation)