Package nltk :: Package tokenize :: Module api :: Class TokenizerI
[hide private]
[frames] | no frames]

type TokenizerI

source code

object --+
         |
        TokenizerI
Known Subclasses:

A processing interface for tokenizing a string, or dividing it into a list of substrings.

Subclasses must define:

Instance Methods [hide private]
 
tokenize(self, s)
Divide the given string into a list of substrings.
source code
 
span_tokenize(self, s)
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
source code
list of list of str
batch_tokenize(self, strings)
Apply self.tokenize() to each element of strings.
source code
iter of list of tuple of int
batch_span_tokenize(self, strings)
Apply self.span_tokenize() to each element of strings.
source code
Method Details [hide private]

tokenize(self, s)

source code 

Divide the given string into a list of substrings.

Returns:
list of str

span_tokenize(self, s)

source code 

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Returns:
iter of tuple of int

batch_tokenize(self, strings)

source code 

Apply self.tokenize() to each element of strings. I.e.:

>>> return [self.tokenize(s) for s in strings]
Returns: list of list of str

batch_span_tokenize(self, strings)

source code 

Apply self.span_tokenize() to each element of strings. I.e.:

>>> return [self.span_tokenize(s) for s in strings]
Returns: iter of list of tuple of int