__init__(self,
corpus_file,
encoding,
tagged,
group_by_sent,
group_by_para,
sent_splitter=None)
(Constructor)
| source code
|
Create a new corpus view, based on the file fileid, and
read with block_reader. See the class documentation for
more information.
- Parameters:
fileid - The path to the file that is read by this corpus view.
fileid can either be a string or a PathPointer.
startpos - The file position at which the view will start reading. This can
be used to skip over preface sections.
encoding - The unicode encoding that should be used to read the file's
contents. If no encoding is specified, then the file's contents
will be read as a non-unicode string (i.e., a str).
source - If specified, then use an SourcedStringStream to annotate all strings read
from the file with information about their start offset, end
ofset, and docid. The value of ``source`` will be used as the
docid.
- Overrides:
util.StreamBackedCorpusView.__init__
- (inherited documentation)
|