Package nltk :: Package classify :: Module megam
[hide private]
[frames] | no frames]

Module megam

source code

A set of functions used to interface with the external megam maxent optimization package. Before megam can be used, you should tell NLTK where it can find the megam binary, using the config_megam() function. Typical usage:

>>> import nltk
>>> nltk.config_megam('.../path/to/megam')
>>> classifier = nltk.MaxentClassifier.train(corpus, 'megam')
Functions [hide private]
    Configuration
 
config_megam(bin=None)
Configure NLTK's interface to the megam maxent optimization package.
source code
    Megam Interface Functions
 
write_megam_file(train_toks, encoding, stream, bernoulli=True, explicit=True)
Generate an input file for megam based on the given corpus of classified tokens.
source code
 
parse_megam_weights(s, features_count, explicit=True)
Given the stdout output generated by megam when training a model, return a numpy array containing the corresponding weight vector.
source code
 
_write_megam_features(vector, stream, bernoulli) source code
 
call_megam(args)
Call the megam binary with the given arguments.
source code
Variables [hide private]
    Configuration
  _megam_bin = None
Function Details [hide private]

config_megam(bin=None)

source code 

Configure NLTK's interface to the megam maxent optimization package.

Parameters:
  • bin (string) - The full path to the megam binary. If not specified, then nltk will search the system for a megam binary; and if one is not found, it will raise a LookupError exception.

write_megam_file(train_toks, encoding, stream, bernoulli=True, explicit=True)

source code 

Generate an input file for megam based on the given corpus of classified tokens.

Parameters:
  • train_toks (list of tuples of (dict, str)) - Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.
  • encoding (MaxentFeatureEncodingI) - A feature encoding, used to convert featuresets into feature vectors.
  • stream (stream) - The stream to which the megam input file should be written.
  • bernoulli - If true, then use the 'bernoulli' format. I.e., all joint features have binary values, and are listed iff they are true. Otherwise, list feature values explicitly. If bernoulli=False, then you must call megam with the -fvals option.
  • explicit - If true, then use the 'explicit' format. I.e., list the features that would fire for any of the possible labels, for each token. If explicit=True, then you must call megam with the -explicit option.

parse_megam_weights(s, features_count, explicit=True)

source code 

Given the stdout output generated by megam when training a model, return a numpy array containing the corresponding weight vector. This function does not currently handle bias features.