Package nltk :: Package tokenize :: Module api :: Class StringTokenizer
[hide private]
[frames] | no frames]

type StringTokenizer

source code

object --+    
         |    
TokenizerI --+
             |
            StringTokenizer
Known Subclasses:

A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses).

Instance Methods [hide private]
 
tokenize(self, s)
Divide the given string into a list of substrings.
source code
 
span_tokenize(self, s)
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
source code

Inherited from TokenizerI: batch_span_tokenize, batch_tokenize

Method Details [hide private]

tokenize(self, s)

source code 

Divide the given string into a list of substrings.

Returns:
list of str
Overrides: TokenizerI.tokenize
(inherited documentation)

span_tokenize(self, s)

source code 

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Returns:
iter of tuple of int
Overrides: TokenizerI.span_tokenize
(inherited documentation)