A reader for plaintext corpora whose documents are divided into
categories based on their file identifiers.
|
|
|
|
|
|
str
|
raw(self,
fileids=None,
categories=None)
Returns:
the given file(s) as a single string. |
source code
|
|
list of str
|
words(self,
fileids=None,
categories=None)
Returns:
the given file(s) as a list of words and punctuation symbols. |
source code
|
|
list of (list of str)
|
sents(self,
fileids=None,
categories=None)
Returns:
the given file(s) as a list of sentences or utterances, each encoded
as a list of word strings. |
source code
|
|
list of (list of (list of
str))
|
paras(self,
fileids=None,
categories=None)
Returns:
the given file(s) as a list of paragraphs, each encoded as a list of
sentences, which are in turn encoded as lists of word strings. |
source code
|
|
|
Inherited from api.CategorizedCorpusReader:
categories,
fileids
Inherited from api.CorpusReader:
__repr__,
abspath,
abspaths,
encoding,
open,
readme
|
|
Inherited from api.CorpusReader:
files
|
|
Inherited from api.CorpusReader:
items
|