Package nltk :: Package tokenize :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Functions [hide private]
iter of tuple of int
string_span_tokenize(s, sep)
Identify the tokens in the string, as defined by the token delimiter, and generate (start, end) offsets.
source code
iter of tuple of int
regexp_span_tokenize(s, regexp)
Identify the tokens in the string, as defined by the token delimiter regexp, and generate (start, end) offsets.
source code
iter of tuple of int
spans_to_relative(spans)
Convert absolute token spans to relative spans.
source code
Function Details [hide private]

string_span_tokenize(s, sep)

source code 

Identify the tokens in the string, as defined by the token delimiter, and generate (start, end) offsets.

Parameters:
  • s (str) - the string to be tokenized
  • sep (str) - the token separator
Returns: iter of tuple of int

regexp_span_tokenize(s, regexp)

source code 

Identify the tokens in the string, as defined by the token delimiter regexp, and generate (start, end) offsets.

Parameters:
  • s (str) - the string to be tokenized
  • regexp (str) - the token separator regexp
Returns: iter of tuple of int

spans_to_relative(spans)

source code 

Convert absolute token spans to relative spans.

Parameters:
  • spans - the (start, end) offsets of the tokens
  • s (iter of tuple of int)
Returns: iter of tuple of int