Package nltk :: Package classify :: Module maxent
[hide private]
[frames] | no frames]

Module maxent

source code

A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistant with the training data; and chooses the distribution with the highest entropy. A probability distribution is empirically consistant with a set of training data if its estimated frequency with which a class and a feature vector value co-occur is equal to the actual frequency in the data.

Terminology: 'feature'

The term feature is usually used to refer to some property of an unlabeled token. For example, when performing word sense disambiguation, we might define a 'prevword' feature whose value is the word preceeding the target word. However, in the context of maxent modeling, the term feature is typically used to refer to a property of a labeled token. In order to prevent confusion, we will introduce two distinct terms to disambiguate these two different concepts:

In the rest of the nltk.classify module, the term features is used to refer to what we will call input-features in this module.

In literature that describes and discusses maximum entropy models, input-features are typically called contexts, and joint-features are simply referred to as features.

Converting Input-Features to Joint-Features

In maximum entropy models, joint-features are required to have numeric values. Typically, each input-feature input_feat is mapped to a set of joint-features of the form:

   joint_feat(token, label) = { 1 if input_feat(token) == feat_val
                              {      and label == some_label
                              {
                              { 0 otherwise

For all values of feat_val and some_label. This mapping is performed by classes that implement the MaxentFeatureEncodingI interface.

Classes [hide private]
    Classifier Model
MaxentClassifier
A maximum entropy classifier (also known as a conditional exponential classifier).
ConditionalExponentialClassifier
Alias for MaxentClassifier.
    Feature Encodings
MaxentFeatureEncodingI
A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label.
FunctionBackedMaxentFeatureEncoding
A feature encoding that calls a user-supplied function to map a given featureset/label pair to a sparse joint-feature vector.
BinaryMaxentFeatureEncoding
A feature encoding that generates vectors containing a binary joint-features of the form:
GISEncoding
A binary feature encoding which adds one new joint-feature to the joint-features defined by BinaryMaxentFeatureEncoding: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number.
TadmEventMaxentFeatureEncoding
TypedMaxentFeatureEncoding
A feature encoding that generates vectors containing integer, float and binary joint-features of the form::
    Classifier Trainer: tadm
TadmMaxentClassifier
Functions [hide private]
    Classifier Trainer: Generalized Iterative Scaling
 
train_maxent_classifier_with_gis(train_toks, trace=3, encoding=None, labels=None, **cutoffs)
Train a new ConditionalExponentialClassifier, using the given training samples, using the Generalized Iterative Scaling algorithm.
source code
 
calculate_empirical_fcount(train_toks, encoding) source code
 
calculate_estimated_fcount(classifier, train_toks, encoding) source code
    Classifier Trainer: Improved Iterative Scaling
 
train_maxent_classifier_with_iis(train_toks, trace=3, encoding=None, labels=None, **cutoffs)
Train a new ConditionalExponentialClassifier, using the given training samples, using the Improved Iterative Scaling algorithm.
source code
dictionary from int to int
calculate_nfmap(train_toks, encoding)
Construct a map that can be used to compress nf (which is typically sparse).
source code
 
calculate_deltas(train_toks, classifier, unattested, ffreq_empirical, nfmap, nfarray, nftranspose, encoding)
Calculate the update values for the classifier weights for this iteration of IIS.
source code
    Classifier Trainer: scipy algorithms (GC, LBFGSB, etc.)
 
train_maxent_classifier_with_scipy(train_toks, trace=3, encoding=None, labels=None, algorithm='CG', sparse=True, gaussian_prior_sigma=0, **cutoffs)
Train a new ConditionalExponentialClassifier, using the given training samples, using the specified scipy optimization algorithm.
source code
    Classifier Trainer: megam
 
train_maxent_classifier_with_megam(train_toks, trace=3, encoding=None, labels=None, gaussian_prior_sigma=0, **kwargs)
Train a new ConditionalExponentialClassifier, using the given training samples, using the external megam library.
source code
    Demo
 
demo() source code
Function Details [hide private]

train_maxent_classifier_with_gis(train_toks, trace=3, encoding=None, labels=None, **cutoffs)

source code 

Train a new ConditionalExponentialClassifier, using the given training samples, using the Generalized Iterative Scaling algorithm. This ConditionalExponentialClassifier will encode the model that maximizes entropy from all the models that are empirically consistent with train_toks.

See Also: train_maxent_classifier() for parameter descriptions.

train_maxent_classifier_with_iis(train_toks, trace=3, encoding=None, labels=None, **cutoffs)

source code 

Train a new ConditionalExponentialClassifier, using the given training samples, using the Improved Iterative Scaling algorithm. This ConditionalExponentialClassifier will encode the model that maximizes entropy from all the models that are empirically consistent with train_toks.

See Also: train_maxent_classifier() for parameter descriptions.

calculate_nfmap(train_toks, encoding)

source code 

Construct a map that can be used to compress nf (which is typically sparse).

nf(feature_vector) is the sum of the feature values for feature_vector.

This represents the number of features that are active for a given labeled text. This method finds all values of nf(t) that are attested for at least one token in the given list of training tokens; and constructs a dictionary mapping these attested values to a continuous range 0...N. For example, if the only values of nf() that were attested were 3, 5, and 7, then _nfmap might return the dictionary {3:0, 5:1, 7:2}.

Returns: dictionary from int to int
A map that can be used to compress nf to a dense vector.

calculate_deltas(train_toks, classifier, unattested, ffreq_empirical, nfmap, nfarray, nftranspose, encoding)

source code 

Calculate the update values for the classifier weights for this iteration of IIS. These update weights are the value of delta that solves the equation:

 ffreq_empirical[i]
        =
 SUM[fs,l] (classifier.prob_classify(fs).prob(l) *
            feature_vector(fs,l)[i] *
            exp(delta[i] * nf(feature_vector(fs,l))))

Where:

  • (fs,l) is a (featureset, label) tuple from train_toks
  • feature_vector(fs,l) = encoding.encode(fs,l)
  • nf(vector) = sum([val for (id,val) in vector])

This method uses Newton's method to solve this equation for delta[i]. In particular, it starts with a guess of delta[i]=1; and iteratively updates delta with:

   delta[i] -= (ffreq_empirical[i] - sum1[i])/(-sum2[i])

until convergence, where sum1 and sum2 are defined as:

   sum1[i](delta) = SUM[fs,l] f[i](fs,l,delta)
   
   sum2[i](delta) = SUM[fs,l] (f[i](fs,l,delta) *
                               nf(feature_vector(fs,l)))
   
 f[i](fs,l,delta) = (classifier.prob_classify(fs).prob(l) *
                     feature_vector(fs,l)[i] *
                     exp(delta[i] * nf(feature_vector(fs,l))))

Note that sum1 and sum2 depend on delta; so they need to be re-computed each iteration.

The variables nfmap, nfarray, and nftranspose are used to generate a dense encoding for nf(ltext). This allows _deltas to calculate sum1 and sum2 using matrices, which yields a signifigant performance improvement.

Parameters:
  • train_toks (list of tuples of (dict, str)) - The set of training tokens.
  • classifier (ClassifierI) - The current classifier.
  • ffreq_empirical (sequence of float) - An array containing the empirical frequency for each feature. The ith element of this array is the empirical frequency for feature i.
  • unattested (sequence of int) - An array that is 1 for features that are not attested in the training data; and 0 for features that are attested. In other words, unattested[i]==0 iff ffreq_empirical[i]==0.
  • nfmap (dictionary from int to int) - A map that can be used to compress nf to a dense vector.
  • nfarray (array of float) - An array that can be used to uncompress nf from a dense vector.
  • nftranspose (The transpose of nfarray) - array of float

train_maxent_classifier_with_scipy(train_toks, trace=3, encoding=None, labels=None, algorithm='CG', sparse=True, gaussian_prior_sigma=0, **cutoffs)

source code 

Train a new ConditionalExponentialClassifier, using the given training samples, using the specified scipy optimization algorithm. This ConditionalExponentialClassifier will encode the model that maximizes entropy from all the models that are empirically consistent with train_toks.

See Also: train_maxent_classifier() for parameter descriptions.

Requires: The scipy package must be installed.

train_maxent_classifier_with_megam(train_toks, trace=3, encoding=None, labels=None, gaussian_prior_sigma=0, **kwargs)

source code 

Train a new ConditionalExponentialClassifier, using the given training samples, using the external megam library. This ConditionalExponentialClassifier will encode the model that maximizes entropy from all the models that are empirically consistent with train_toks.

See Also:
train_maxent_classifier() for parameter descriptions., nltk.classify.megam