Package nltk :: Package tokenize :: Module punkt :: Class PunktToken
[hide private]
[frames] | no frames]

type PunktToken

source code

object --+
         |
        PunktToken

Stores a token of text with annotations produced during sentence boundary detection.

Instance Methods [hide private]
 
__init__(self, tok, **params) source code
    Derived properties
 
_get_type(self, tok)
Returns a case-normalized representation of the token.
source code
    String representation
 
__repr__(self)
A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.
source code
 
__str__(self)
A string representation akin to that used by Kiss and Strunk.
source code
Class Variables [hide private]
  _properties = ['parastart', 'linestart', 'sentbreak', 'abbr', ...
    Regular expressions for properties
  _RE_ELLIPSIS = re.compile(r'\.\.+$')
  _RE_NUMERIC = re.compile(r'^-?[\.,]?\d[\d,\.-]*\.?$')
  _RE_INITIAL = re.compile(r'(?u)[^\W\d]\.$')
  _RE_ALPHA = re.compile(r'(?u)[^\W\d]+$')
Properties [hide private]
  abbr
  ellipsis
  linestart
  parastart
  period_final
  sentbreak
  tok
  type
    Derived properties
  type_no_period
The type with its final period removed if it has one.
  type_no_sentperiod
The type with its final period removed if it is marked as a sentence break.
  first_upper
True if the token's first character is uppercase.
  first_lower
True if the token's first character is lowercase.
  first_case
  is_ellipsis
True if the token text is that of an ellipsis.
  is_number
True if the token text is that of a number.
  is_initial
True if the token text is that of an initial.
  is_alpha
True if the token text is all alphabetic.
  is_non_punct
True if the token is either a number or is alphabetic.
Method Details [hide private]

__init__(self, tok, **params)
(Constructor)

source code 
Overrides: object.__init__
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.

Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 

A string representation akin to that used by Kiss and Strunk.

Overrides: object.__str__

Class Variable Details [hide private]

_properties

Value:
['parastart', 'linestart', 'sentbreak', 'abbr', 'ellipsis']

Property Details [hide private]

type_no_period

The type with its final period removed if it has one.

type_no_sentperiod

The type with its final period removed if it is marked as a sentence break.

first_upper

True if the token's first character is uppercase.

first_lower

True if the token's first character is lowercase.

first_case

is_ellipsis

True if the token text is that of an ellipsis.

is_number

True if the token text is that of a number.

is_initial

True if the token text is that of an initial.

is_alpha

True if the token text is all alphabetic.

is_non_punct

True if the token is either a number or is alphabetic.