Package nltk :: Package corpus :: Package reader :: Module ipipan :: Class IPIPANCorpusView
[hide private]
[frames] | no frames]

type IPIPANCorpusView

source code

               object --+        
                        |        
util.AbstractLazySequence --+    
                            |    
  util.StreamBackedCorpusView --+
                                |
                               IPIPANCorpusView

Instance Methods [hide private]
 
__init__(self, filename, startpos=0, **kwargs)
Create a new corpus view, based on the file fileid, and read with block_reader.
source code
list of any
read_block(self, stream)
Read a block from the input stream.
source code
 
_read_data(self, stream) source code
 
_seek(self, stream) source code
 
_append_space(self, sentence) source code

Inherited from util.StreamBackedCorpusView: __add__, __getitem__, __len__, __mul__, __radd__, __rmul__, close, iterate_from

Inherited from util.StreamBackedCorpusView (private): _open

Inherited from util.AbstractLazySequence: __cmp__, __contains__, __hash__, __iter__, __repr__, count, index

Class Variables [hide private]
  WORDS_MODE = 0
  SENTS_MODE = 1
  PARAS_MODE = 2

Inherited from util.AbstractLazySequence (private): _MAX_REPR_SIZE

Instance Variables [hide private]
Properties [hide private]

Inherited from util.StreamBackedCorpusView: fileid

Method Details [hide private]

__init__(self, filename, startpos=0, **kwargs)
(Constructor)

source code 

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters:
  • fileid - The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.
  • startpos - The file position at which the view will start reading. This can be used to skip over preface sections.
  • encoding - The unicode encoding that should be used to read the file's contents. If no encoding is specified, then the file's contents will be read as a non-unicode string (i.e., a str).
  • source - If specified, then use an SourcedStringStream to annotate all strings read from the file with information about their start offset, end ofset, and docid. The value of ``source`` will be used as the docid.
Overrides: util.StreamBackedCorpusView.__init__
(inherited documentation)

read_block(self, stream)

source code 

Read a block from the input stream.

Parameters:
  • stream - an input stream
Returns: list of any
a block of tokens from the input stream
Overrides: util.StreamBackedCorpusView.read_block
(inherited documentation)