__init__(self,
root,
fileids,
sep='/',
word_tokenizer=WhitespaceTokenizer(pattern='\\s+', gaps=True, discard_empty=T...,
sent_tokenizer=RegexpTokenizer(pattern='\n', gaps=True, discard_empty=True, f...,
alignedsent_block_reader=<function read_alignedsent_block at 0x132beb0>,
encoding=None)
(Constructor)
| source code
|
Construct a new Aligned Corpus reader for a set of documents located
at the given root directory. Example usage:
>>> root = '/...path to corpus.../'
>>> reader = AlignedCorpusReader(root, '.*', '.txt')
- Parameters:
root - The root directory for this corpus.
fileids - A list or regexp specifying the fileids in this corpus.
- Overrides:
api.CorpusReader.__init__
|