Package nltk :: Package corpus :: Package reader :: Module tagged :: Class CategorizedTaggedCorpusReader
[hide private]
[frames] | no frames]

type CategorizedTaggedCorpusReader

source code

                 object --+    
                          |    
api.CategorizedCorpusReader --+
                              |
             object --+       |
                      |       |
       api.CorpusReader --+   |
                          |   |
         TaggedCorpusReader --+
                              |
                             CategorizedTaggedCorpusReader

A reader for part-of-speech tagged corpora whose documents are divided into categories based on their file identifiers.

Instance Methods [hide private]
 
__init__(self, *args, **kwargs)
Initialize the corpus reader.
source code
 
_resolve(self, fileids, categories) source code
str
raw(self, fileids=None, categories=None)
Returns: the given file(s) as a single string.
source code
list of str
words(self, fileids=None, categories=None)
Returns: the given file(s) as a list of words and punctuation symbols.
source code
list of (list of str)
sents(self, fileids=None, categories=None)
Returns: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings.
source code
list of (list of (list of str))
paras(self, fileids=None, categories=None)
Returns: the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
source code
list of (str,str)
tagged_words(self, fileids=None, categories=None, simplify_tags=False)
Returns: the given file(s) as a list of tagged words and punctuation symbols, encoded as tuples (word,tag).
source code
list of (list of (str,str))
tagged_sents(self, fileids=None, categories=None, simplify_tags=False)
Returns: the given file(s) as a list of sentences, each encoded as a list of (word,tag) tuples.
source code
list of (list of (list of (str,str)))
tagged_paras(self, fileids=None, categories=None, simplify_tags=False)
Returns: the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of (word,tag) tuples.
source code

Inherited from api.CategorizedCorpusReader: categories, fileids

Inherited from api.CategorizedCorpusReader (private): _add, _init

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open, readme

Inherited from api.CorpusReader (private): _get_root

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

__init__(self, *args, **kwargs)
(Constructor)

source code 

Initialize the corpus reader. Categorization arguments (cat_pattern, cat_map, and cat_file) are passed to the CategorizedCorpusReader constructor. The remaining arguments are passed to the TaggedCorpusReader constructor.

Parameters:
  • root - The root directory for this corpus.
  • fileids - A list or regexp specifying the fileids in this corpus.
Overrides: api.CorpusReader.__init__

raw(self, fileids=None, categories=None)

source code 
Returns: str
the given file(s) as a single string.
Overrides: TaggedCorpusReader.raw
(inherited documentation)

words(self, fileids=None, categories=None)

source code 
Returns: list of str
the given file(s) as a list of words and punctuation symbols.
Overrides: TaggedCorpusReader.words
(inherited documentation)

sents(self, fileids=None, categories=None)

source code 
Returns: list of (list of str)
the given file(s) as a list of sentences or utterances, each encoded as a list of word strings.
Overrides: TaggedCorpusReader.sents
(inherited documentation)

paras(self, fileids=None, categories=None)

source code 
Returns: list of (list of (list of str))
the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
Overrides: TaggedCorpusReader.paras
(inherited documentation)

tagged_words(self, fileids=None, categories=None, simplify_tags=False)

source code 
Returns: list of (str,str)
the given file(s) as a list of tagged words and punctuation symbols, encoded as tuples (word,tag).
Overrides: TaggedCorpusReader.tagged_words
(inherited documentation)

tagged_sents(self, fileids=None, categories=None, simplify_tags=False)

source code 
Returns: list of (list of (str,str))
the given file(s) as a list of sentences, each encoded as a list of (word,tag) tuples.
Overrides: TaggedCorpusReader.tagged_sents
(inherited documentation)

tagged_paras(self, fileids=None, categories=None, simplify_tags=False)

source code 
Returns: list of (list of (list of (str,str)))
the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of (word,tag) tuples.
Overrides: TaggedCorpusReader.tagged_paras
(inherited documentation)