Package nltk :: Package corpus :: Package reader :: Module plaintext :: Class EuroparlCorpusReader
[hide private]
[frames] | no frames]

type EuroparlCorpusReader

source code

       object --+        
                |        
 api.CorpusReader --+    
                    |    
PlaintextCorpusReader --+
                        |
                       EuroparlCorpusReader

Reader for Europarl corpora that consist of plaintext documents. Documents are divided into chapters instead of paragraphs as for regular plaintext documents. Chapters are separated using blank lines. Everything is inherited from PlaintextCorpusReader except that:

Nested Classes [hide private]

Inherited from PlaintextCorpusReader: CorpusView

Instance Methods [hide private]
 
_read_word_block(self, stream) source code
 
_read_sent_block(self, stream) source code
 
_read_para_block(self, stream) source code
list of (list of (list of str))
chapters(self, fileids=None)
Returns: the given file(s) as a list of chapters, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
source code
list of (list of (list of str))
paras(self, fileids=None)
Returns: the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
source code

Inherited from PlaintextCorpusReader: __init__, raw, sents, words

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, fileids, open, readme

Inherited from api.CorpusReader (private): _get_root

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

_read_word_block(self, stream)

source code 
Overrides: PlaintextCorpusReader._read_word_block

_read_sent_block(self, stream)

source code 
Overrides: PlaintextCorpusReader._read_sent_block

_read_para_block(self, stream)

source code 
Overrides: PlaintextCorpusReader._read_para_block

chapters(self, fileids=None)

source code 
Returns: list of (list of (list of str))
the given file(s) as a list of chapters, each encoded as a list of sentences, which are in turn encoded as lists of word strings.

paras(self, fileids=None)

source code 
Returns: list of (list of (list of str))
the given file(s) as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
Overrides: PlaintextCorpusReader.paras
(inherited documentation)