Package nltk :: Package tag :: Module api :: Class TaggerI
[hide private]
[frames] | no frames]

type TaggerI

source code

object --+
         |
        TaggerI
Known Subclasses:

A processing interface for assigning a tag to each token in a list. Tags are case sensitive strings that identify some property of each token, such as its part of speech or its sense.

Some taggers require specific types for their tokens. This is generally indicated by the use of a sub-interface to TaggerI. For example, featureset taggers, which are subclassed from FeaturesetTaggerI, require that each token be a featureset.

Subclasses must define:

Instance Methods [hide private]
list of (token, tag)
tag(self, tokens)
Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens.
source code
 
batch_tag(self, sentences)
Apply self.tag() to each element of sentences.
source code
float
evaluate(self, gold)
Score the accuracy of the tagger against the gold standard.
source code
 
_check_params(self, train, model) source code
Method Details [hide private]

tag(self, tokens)

source code 

Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).

Returns: list of (token, tag)

batch_tag(self, sentences)

source code 

Apply self.tag() to each element of sentences. I.e.:

>>> return [self.tag(sent) for sent in sentences]

evaluate(self, gold)

source code 

Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.

Parameters:
  • gold (list of list of (token, tag)) - The list of tagged sentences to score the tagger on.
Returns: float