| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
TokenizerI
A processing interface for tokenizing a string, or dividing it into a list of substrings.
Subclasses must define:
|
|||
|
|||
|
|||
list of list of str
|
|
||
iter of list of tuple of
int
|
|
||
|
|||
Divide the given string into a list of substrings.
|
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
|
Apply self.tokenize() to each element of >>> return [self.tokenize(s) for s in strings]
|
Apply self.span_tokenize() to each element of
>>> return [self.span_tokenize(s) for s in strings]
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:53 2011 | http://epydoc.sourceforge.net |