Package nltk :: Package classify :: Module maxent :: Class TadmMaxentClassifier
[hide private]
[frames] | no frames]

type TadmMaxentClassifier

source code

     object --+        
              |        
api.ClassifierI --+    
                  |    
   MaxentClassifier --+
                      |
                     TadmMaxentClassifier

Instance Methods [hide private]

Inherited from MaxentClassifier: __init__, __repr__, classify, explain, labels, prob_classify, set_weights, show_most_informative_features, weights

Inherited from api.ClassifierI: batch_classify, batch_prob_classify

    Deprecated

Inherited from api.ClassifierI: batch_probdist, probdist

Class Methods [hide private]
MaxentClassifier
train(cls, train_toks, **kwargs)
Train a new maxent classifier based on the given corpus of training samples.
source code
Class Variables [hide private]

Inherited from MaxentClassifier: ALGORITHMS

Inherited from MaxentClassifier (private): _SCIPY_ALGS

Method Details [hide private]

train(cls, train_toks, **kwargs)
Class Method

source code 

Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus.

Parameters:
  • train_toks - Training data, represented as a list of pairs, the first member of which is a featureset, and the second of which is a classification label.
  • algorithm - A case-insensitive string, specifying which algorithm should be used to train the classifier. The following algorithms are currently available.
    • Iterative Scaling Methods
      • 'GIS': Generalized Iterative Scaling
      • 'IIS': Improved Iterative Scaling
    • Optimization Methods (require scipy)
      • 'CG': Conjugate gradient
      • 'BFGS': Broyden-Fletcher-Goldfarb-Shanno algorithm
      • 'Powell': Powell agorithm
      • 'LBFGSB': A limited-memory variant of the BFGS algorithm
      • 'Nelder-Mead': The Nelder-Mead algorithm
    • External Libraries
      • 'megam': LM-BFGS algorithm, with training performed by an megam. (requires that megam be installed.)

    The default algorithm is 'CG' if 'scipy' is installed; and 'iis' otherwise.

  • trace - The level of diagnostic tracing output to produce. Higher values produce more verbose output.
  • encoding - A feature encoding, used to convert featuresets into feature vectors. If none is specified, then a BinaryMaxentFeatureEncoding will be built based on the features that are attested in the training corpus.
  • labels - The set of possible labels. If none is given, then the set of all labels attested in the training data will be used instead.
  • sparse - If true, then use sparse matrices instead of dense matrices. Currently, this is only supported by the scipy (optimization method) algorithms. For other algorithms, its value is ignored.
  • gaussian_prior_sigma - The sigma value for a gaussian prior on model weights. Currently, this is supported by the scipy (optimization method) algorithms and megam. For other algorithms, its value is ignored.
  • cutoffs - Arguments specifying various conditions under which the training should be halted. (Some of the cutoff conditions are not supported by some algorithms.)
    • max_iter=v: Terminate after v iterations.
    • min_ll=v: Terminate after the negative average log-likelihood drops under v.
    • min_lldelta=v: Terminate if a single iteration improves log likelihood by less than v.
    • tolerance=v: Terminate a scipy optimization method when improvement drops below a tolerance level v. The exact meaning of this tolerance depends on the scipy algorithm used. See scipy documentation for more info. Default values: 1e-3 for CG, 1e-5 for LBFGSB, and 1e-4 for other algorithms. (scipy only)
Returns: MaxentClassifier
The new maxent classifier
Overrides: MaxentClassifier.train
(inherited documentation)