|
Construct a new tokenizer that splits strings using the given regular
expression pattern. By default, pattern will
be used to find tokens; but if gaps is set to
False, then patterns will be used to find
separators between tokens instead.
- Parameters:
pattern - The pattern used to build this tokenizer. This pattern may safely
contain grouping parenthases.
gaps - True if this tokenizer's pattern should be used to find
separators between tokens; False if this tokenizer's pattern
should be used to find the tokens themselves.
discard_empty - True if any empty tokens ('') generated by the
tokenizer should be discarded. Empty tokens can only be
generated if _gaps is true.
flags - The regexp flags used to compile this tokenizer's pattern. By
default, the following flags are used: re.UNICODE |
re.MULTILINE | re.DOTALL.
- Overrides:
RegexpTokenizer.__init__
- (inherited documentation)
|