Home  Trees  Indices  Help 



A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistant with the training data; and chooses the distribution with the highest entropy. A probability distribution is empirically consistant with a set of training data if its estimated frequency with which a class and a feature vector value cooccur is equal to the actual frequency in the data.
The term feature is usually used to refer to some property of
an unlabeled token. For example, when performing word sense
disambiguation, we might define a 'prevword'
feature whose
value is the word preceeding the target word. However, in the context
of maxent modeling, the term feature is typically used to refer
to a property of a labeled token. In order to prevent confusion, we
will introduce two distinct terms to disambiguate these two different
concepts:
In the rest of the nltk.classify
module, the term features is used to
refer to what we will call inputfeatures in this module.
In literature that describes and discusses maximum entropy models, inputfeatures are typically called contexts, and jointfeatures are simply referred to as features.
In maximum entropy models, jointfeatures are required to have
numeric values. Typically, each inputfeature
input_feat
is mapped to a set of jointfeatures of the
form:
joint_feat(token, label) = { 1 if input_feat(token) == feat_val { and label == some_label { { 0 otherwise
For all values of feat_val
and
some_label
. This mapping is performed by classes that
implement the MaxentFeatureEncodingI interface.


Classifier Model  

MaxentClassifier A maximum entropy classifier (also known as a conditional exponential classifier). 

ConditionalExponentialClassifier Alias for MaxentClassifier. 

Feature Encodings  
MaxentFeatureEncodingI A mapping that converts a set of inputfeature values to a vector of jointfeature values, given a label. 

FunctionBackedMaxentFeatureEncoding A feature encoding that calls a usersupplied function to map a given featureset/label pair to a sparse jointfeature vector. 

BinaryMaxentFeatureEncoding A feature encoding that generates vectors containing a binary jointfeatures of the form: 

GISEncoding A binary feature encoding which adds one new jointfeature to the jointfeatures defined by BinaryMaxentFeatureEncoding: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant nonnegative number. 

TadmEventMaxentFeatureEncoding  
TypedMaxentFeatureEncoding A feature encoding that generates vectors containing integer, float and binary jointfeatures of the form:: 

Classifier Trainer: tadm  
TadmMaxentClassifier 


Classifier Trainer: Generalized Iterative Scaling  







Classifier Trainer: Improved Iterative Scaling  


dictionary from int to int





Classifier Trainer: scipy algorithms (GC, LBFGSB, etc.)  


Classifier Trainer: megam  


Demo  


Train a new See Also: train_maxent_classifier() for parameter descriptions. 
Train a new See Also: train_maxent_classifier() for parameter descriptions. 
Construct a map that can be used to compress nf(feature_vector) is the sum of the feature values for feature_vector. This represents the number of features that are active for a given
labeled text. This method finds all values of nf(t)
that are attested for at least one token in the given list of training
tokens; and constructs a dictionary mapping these attested values to a
continuous range 0...N. For example, if the only
values of nf() that were attested were 3, 5, and 7,
then

Calculate the update values for the classifier weights for this
iteration of IIS. These update weights are the value of
ffreq_empirical[i] = SUM[fs,l] (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Where:
This method uses Newton's method to solve this equation for delta[i]. In particular, it starts with a guess of
delta[i] = (ffreq_empirical[i]  sum1[i])/(sum2[i]) until convergence, where sum1 and sum2 are defined as: sum1[i](delta) = SUM[fs,l] f[i](fs,l,delta) sum2[i](delta) = SUM[fs,l] (f[i](fs,l,delta) * nf(feature_vector(fs,l))) f[i](fs,l,delta) = (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Note that sum1 and sum2 depend
on The variables

Train a new See Also: train_maxent_classifier() for parameter descriptions. Requires:
The 
Train a new

Home  Trees  Indices  Help 


Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:41 2011  http://epydoc.sourceforge.net 