Package nltk :: Package tag :: Module sequential :: Class ClassifierBasedTagger
[hide private]
[frames] | no frames]

type ClassifierBasedTagger

source code

         object --+        
                  |        
        api.TaggerI --+    
                      |    
SequentialBackoffTagger --+
                          |
         object --+       |
                  |       |
        api.TaggerI --+   |
                      |   |
  api.FeaturesetTaggerI --+
                          |
                         ClassifierBasedTagger
Known Subclasses:

A sequential tagger that uses a classifier to choose the tag for each token in a sentence. The featureset input for the classifier is generated by a feature detector function:

   feature_detector(tokens, index, history) -> featureset

Where tokens is the list of unlabeled tokens in the sentence; index is the index of the token for which feature detection should be performed; and history is list of the tags for all tokens before index.

Instance Methods [hide private]
 
__init__(self, feature_detector=None, train=None, classifier_builder=<function train at 0x11c5df0>, classifier=None, backoff=None, cutoff_prob=None, verbose=False)
Construct a new classifier-based sequential tagger.
source code
 
__repr__(self) source code
 
_train(self, tagged_corpus, classifier_builder, verbose)
Build a new classifier, based on the given training data (tagged_corpus).
source code
str
choose_tag(self, tokens, index, history)
Decide which tag should be used for the specified token, and return that tag.
source code
 
classifier(self)
Return the classifier that this tagger uses to choose a tag for each word in a sentence.
source code
 
feature_detector(self, tokens, index, history)
Return the feature detector that this tagger uses to generate featuresets for its classifier.
source code

Inherited from SequentialBackoffTagger: tag, tag_one

Inherited from SequentialBackoffTagger (private): _get_backoff

Inherited from api.TaggerI: batch_tag, evaluate

Inherited from api.TaggerI (private): _check_params

Instance Variables [hide private]
  _classifier
The classifier used to choose a tag for each token.
  _cutoff_prob
Cutoff probability for tagging -- if the probability of the most likely tag is less than this, then use backoff.

Inherited from SequentialBackoffTagger (private): _taggers

Properties [hide private]

Inherited from SequentialBackoffTagger: backoff

Method Details [hide private]

__init__(self, feature_detector=None, train=None, classifier_builder=<function train at 0x11c5df0>, classifier=None, backoff=None, cutoff_prob=None, verbose=False)
(Constructor)

source code 

Construct a new classifier-based sequential tagger.

Parameters:
  • feature_detector - A function used to generate the featureset input for the classifier:
       feature_detector(tokens, index, history) -> featureset
    
  • train - A tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples.
  • backoff - A backoff tagger, to be used by the new tagger if it encounters an unknown context.
  • classifier_builder - A function used to train a new classifier based on the data in train. It should take one argument, a list of labeled featuresets (i.e., (featureset, label) tuples).
  • classifier - The classifier that should be used by the tagger. This is only useful if you want to manually construct the classifier; normally, you would use train instead.
  • backoff - A backoff tagger, used if this tagger is unable to determine a tag for a given token.
  • cutoff_prob - If specified, then this tagger will fall back on its backoff tagger if the probability of the most likely tag is less than cutoff_prob.
Overrides: SequentialBackoffTagger.__init__

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)

choose_tag(self, tokens, index, history)

source code 

Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.

Parameters:
  • tokens - The list of words that are being tagged.
  • index - The index of the word whose tag should be returned.
  • history - A list of the tags for all words before index.
Returns: str
Overrides: SequentialBackoffTagger.choose_tag
(inherited documentation)

classifier(self)

source code 

Return the classifier that this tagger uses to choose a tag for each word in a sentence. The input for this classifier is generated using this tagger's feature detector.

See Also: feature_detector()

feature_detector(self, tokens, index, history)

source code 

Return the feature detector that this tagger uses to generate featuresets for its classifier. The feature detector is a function with the signature:

 feature_detector(tokens, index, history) -> featureset

See Also: classifier()