Package nltk :: Package tag
[hide private]
[frames] | no frames]

Package tag

source code

Classes and interfaces for tagging each token of a sentence with supplementary information, such as its part of speech. This task, which is known as tagging, is defined by the TaggerI interface.

Submodules [hide private]
  • nltk.tag.api: Interface for tagging each token in a sentence with supplementary information, such as its part of speech.
  • nltk.tag.brill: Brill's transformational rule-based tagger.
  • nltk.tag.crf: An interface to Mallet's Linear Chain Conditional Random Field (LC-CRF) implementation.
  • nltk.tag.hmm: Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence.
  • nltk.tag.hunpos: A module for interfacing with the HunPos open-source POS-tagger.
  • nltk.tag.sequential: Classes for tagging sentences sequentially, left to right.
  • nltk.tag.simplify
  • nltk.tag.stanford: A module for interfacing with the Stanford POS-tagger.
  • nltk.tag.tnt: Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants
  • nltk.tag.util

Classes [hide private]
NgramTagger
A tagger that chooses a token's tag based on its word string and on the preceeding n word's tags.
TrigramTagger
A tagger that chooses a token's tag based its word string and on the preceeding two words' tags.
UnigramTagger
A tagger that chooses a token's tag based its word string.
RegexpTagger
A tagger that assigns tags to words based on regular expressions over word strings.
AffixTagger
A tagger that chooses a token's tag based on a leading or trailing substring of its word string.
BigramTagger
A tagger that chooses a token's tag based its word string and on the preceeding words' tag.
DefaultTagger
A tagger that assigns the same tag to every token.
BrillTagger
Brill's transformational rule-based tagger.
FastBrillTaggerTrainer
A faster trainer for brill taggers.
BrillTaggerTrainer
A trainer for brill taggers.
TaggerI
A processing interface for assigning a tag to each token in a list.
HiddenMarkovModelTagger
Hidden Markov model class, a generative model for labelling sequence data.
HiddenMarkovModelTrainer
Algorithms for learning HMM parameters from training data.
Functions [hide private]
 
untag(tagged_sentence)
Given a tagged sentence, return an untagged version of that sentence.
source code
 
pos_tag(tokens)
Use NLTK's currently recommended part of speech tagger to tag the given list of tokens.
source code
 
batch_pos_tag(sentences)
Use NLTK's currently recommended part of speech tagger to tag the given list of sentences, each consisting of a list of tokens.
source code
Variables [hide private]
  _POS_TAGGER = 'taggers/maxent_treebank_pos_tagger/english.pickle'
  ALLOW_THREADS = 1
  BUFSIZE = 10000
  CLIP = 0
  ERR_CALL = 3
  ERR_DEFAULT = 0
  ERR_DEFAULT2 = 2084
  ERR_IGNORE = 0
  ERR_LOG = 5
  ERR_PRINT = 4
  ERR_RAISE = 2
  ERR_WARN = 1
  FLOATING_POINT_SUPPORT = 1
  FPE_DIVIDEBYZERO = 1
  FPE_INVALID = 8
  FPE_OVERFLOW = 2
  FPE_UNDERFLOW = 4
  False_ = False
  Inf = inf
  Infinity = inf
  MAXDIMS = 32
  NAN = nan
  NINF = -inf
  NZERO = -0.0
  NaN = nan
  PINF = inf
  PIPE = -1
  PZERO = 0.0
  RAISE = 2
  SHIFT_DIVIDEBYZERO = 0
  SHIFT_INVALID = 9
  SHIFT_OVERFLOW = 3
  SHIFT_UNDERFLOW = 6
  ScalarType = (<type 'int'>, <type 'float'>, <type 'complex'>, ...
  True_ = True
  UFUNC_BUFSIZE_DEFAULT = 10000
  UFUNC_PYVALS_NAME = 'UFUNC_PYVALS'
  WRAP = 1
  absolute = <ufunc 'absolute'>
  add = <ufunc 'add'>
  arccos = <ufunc 'arccos'>
  arccosh = <ufunc 'arccosh'>
  arcsin = <ufunc 'arcsin'>
  arcsinh = <ufunc 'arcsinh'>
  arctan = <ufunc 'arctan'>
  arctan2 = <ufunc 'arctan2'>
  arctanh = <ufunc 'arctanh'>
  bitwise_and = <ufunc 'bitwise_and'>
  bitwise_not = <ufunc 'invert'>
  bitwise_or = <ufunc 'bitwise_or'>
  bitwise_xor = <ufunc 'bitwise_xor'>
  c_ = <numpy.lib.index_tricks.CClass object at 0x12097b0>
  cast = {<type 'numpy.int64'>: <function <lambda> at 0x109acb0>...
  ceil = <ufunc 'ceil'>
  conj = <ufunc 'conjugate'>
  conjugate = <ufunc 'conjugate'>
  cos = <ufunc 'cos'>
  cosh = <ufunc 'cosh'>
  degrees = <ufunc 'degrees'>
  divide = <ufunc 'divide'>
  e = 2.71828182846
  equal = <ufunc 'equal'>
  exp = <ufunc 'exp'>
  expm1 = <ufunc 'expm1'>
  fabs = <ufunc 'fabs'>
  floor = <ufunc 'floor'>
  floor_divide = <ufunc 'floor_divide'>
  fmod = <ufunc 'fmod'>
  frexp = <ufunc 'frexp'>
  greater = <ufunc 'greater'>
  greater_equal = <ufunc 'greater_equal'>
  hypot = <ufunc 'hypot'>
  index_exp = <numpy.lib.index_tricks.IndexExpression object at ...
  inf = inf
  infty = inf
  invert = <ufunc 'invert'>
  isfinite = <ufunc 'isfinite'>
  isinf = <ufunc 'isinf'>
  isnan = <ufunc 'isnan'>
  ldexp = <ufunc 'ldexp'>
  left_shift = <ufunc 'left_shift'>
  less = <ufunc 'less'>
  less_equal = <ufunc 'less_equal'>
  little_endian = True
  log = <ufunc 'log'>
  log10 = <ufunc 'log10'>
  log1p = <ufunc 'log1p'>
  logical_and = <ufunc 'logical_and'>
  logical_not = <ufunc 'logical_not'>
  logical_or = <ufunc 'logical_or'>
  logical_xor = <ufunc 'logical_xor'>
  maximum = <ufunc 'maximum'>
  mgrid = <numpy.lib.index_tricks.nd_grid object at 0x11faab0>
  minimum = <ufunc 'minimum'>
  mod = <ufunc 'remainder'>
  modf = <ufunc 'modf'>
  multiply = <ufunc 'multiply'>
  nan = nan
  nbytes = {<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <t...
  negative = <ufunc 'negative'>
  newaxis = None
  not_equal = <ufunc 'not_equal'>
  ogrid = <numpy.lib.index_tricks.nd_grid object at 0x11faa90>
  ones_like = <ufunc 'ones_like'>
  pi = 3.14159265359
  power = <ufunc 'power'>
  r_ = <numpy.lib.index_tricks.RClass object at 0x11fa790>
  radians = <ufunc 'radians'>
  reciprocal = <ufunc 'reciprocal'>
  remainder = <ufunc 'remainder'>
  right_shift = <ufunc 'right_shift'>
  rint = <ufunc 'rint'>
  s_ = <numpy.lib.index_tricks.IndexExpression object at 0x1209870>
  sctypeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>,...
  sctypeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bo...
  sctypes = {'complex': [<type 'numpy.complex64'>, <type 'numpy....
  sign = <ufunc 'sign'>
  signbit = <ufunc 'signbit'>
  sin = <ufunc 'sin'>
  sinh = <ufunc 'sinh'>
  sqrt = <ufunc 'sqrt'>
  square = <ufunc 'square'>
  subtract = <ufunc 'subtract'>
  tan = <ufunc 'tan'>
  tanh = <ufunc 'tanh'>
  true_divide = <ufunc 'true_divide'>
  typeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>, 2...
  typeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bool...
  typecodes = {'All': '?bhilqpBHILQPfdgFDGSUVO', 'AllFloat': 'fd...
Function Details [hide private]

untag(tagged_sentence)

source code 

Given a tagged sentence, return an untagged version of that sentence. I.e., return a list containing the first element of each tuple in tagged_sentence.

>>> untag([('John', 'NNP'), ('saw', 'VBD'), ('Mary', 'NNP')]
['John', 'saw', 'mary']

Variables Details [hide private]

ScalarType

Value:
(<type 'int'>,
 <type 'float'>,
 <type 'complex'>,
 <type 'long'>,
 <type 'bool'>,
 <type 'str'>,
 <type 'unicode'>,
 <type 'buffer'>,
...

cast

Value:
{<type 'numpy.int64'>: <function <lambda> at 0x109acb0>, <type 'numpy.\
int16'>: <function <lambda> at 0x109acf0>, <type 'numpy.object_'>: <fu\
nction <lambda> at 0x109ad30>, <type 'numpy.float64'>: <function <lamb\
da> at 0x109ad70>, <type 'numpy.uint16'>: <function <lambda> at 0x109a\
db0>, <type 'numpy.uint8'>: <function <lambda> at 0x109adf0>, <type 'n\
umpy.string_'>: <function <lambda> at 0x10a4030>, <type 'numpy.float12\
8'>: <function <lambda> at 0x109ae70>, <type 'numpy.uint32'>: <functio\
n <lambda> at 0x109aef0>, <type 'numpy.void'>: <function <lambda> at 0\
...

index_exp

Value:
<numpy.lib.index_tricks.IndexExpression object at 0x1209830>

nbytes

Value:
{<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <type 'numpy.object\
_'>: 4, <type 'numpy.float64'>: 8, <type 'numpy.uint16'>: 2, <type 'nu\
mpy.uint8'>: 1, <type 'numpy.int32'>: 4, <type 'numpy.float128'>: 16, \
<type 'numpy.bool_'>: 1, <type 'numpy.uint32'>: 4, <type 'numpy.unicod\
e_'>: 0, <type 'numpy.int8'>: 1, <type 'numpy.complex64'>: 8, <type 'n\
umpy.string_'>: 0, <type 'numpy.uint32'>: 4, <type 'numpy.void'>: 0, <\
type 'numpy.int32'>: 4, <type 'numpy.complex128'>: 16, <type 'numpy.ui\
nt64'>: 8, <type 'numpy.complex256'>: 32, <type 'numpy.float32'>: 4}

sctypeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

sctypeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

sctypes

Value:
{'complex': [<type 'numpy.complex64'>,
             <type 'numpy.complex128'>,
             <type 'numpy.complex256'>],
 'float': [<type 'numpy.float32'>,
           <type 'numpy.float64'>,
           <type 'numpy.float128'>],
 'int': [<type 'numpy.int8'>,
         <type 'numpy.int16'>,
...

typeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

typeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

typecodes

Value:
{'All': '?bhilqpBHILQPfdgFDGSUVO',
 'AllFloat': 'fdgFDG',
 'AllInteger': 'bBhHiIlLqQpP',
 'Character': 'c',
 'Complex': 'FDG',
 'Float': 'fdg',
 'Integer': 'bhilqp',
 'UnsignedInteger': 'BHILQP'}