| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
MaxentFeatureEncodingI --+
|
TypedMaxentFeatureEncoding
A feature encoding that generates vectors containing integer,
float and binary joint-features of the form::
Binary (for string and boolean features):
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
{
{ 0 otherwise
Value (for integer and float features):
joint_feat(fs, l) = { fval if (fs[fname] == type(fval))
{ and (l == label)
{
{ not encoded otherwise
Where C{fname} is the name of an input-feature, C{fval} is a value
for that input-feature, and C{label} is a label.
Typically, these features are constructed based on a training
corpus, using the L{train()} method.
For string and boolean features [type(fval) not in (int, float)]
this method will create one feature for each combination of
C{fname}, C{fval}, and C{label} that occurs at least once in the
training corpus.
For integer and float features [type(fval) in (int, float)] this
method will create one feature for each combination of C{fname}
and C{label} that occurs at least once in the training corpus.
For binary features the C{unseen_features} parameter can be used
to add X{unseen-value features}, which are used whenever an input
feature has a value that was not encountered in the training
corpus. These features have the form::
joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
{ and l == label
{
{ 0 otherwise
Where C{is_unseen(fname, fval)} is true if the encoding does not
contain any joint features that are true when C{fs[fname]==fval}.
The C{alwayson_features} parameter can be used to add X{always-on
features}, which have the form::
joint_feat(fs, l) = { 1 if (l == label)
{
{ 0 otherwise
These always-on features allow the maxent model to directly model
the prior probabilities of each label.
|
|||
|
|||
list of (int, number)
|
|
||
str
|
|
||
list
|
|
||
int
|
|
||
|
|||
|
|||
|
|||
|
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. This vector is represented as a list of
|
|
|
|
Construct and return new feature encoding, based on a given training
corpus Note: recognized feature values types are (int, float), over types are interpreted as regular binary features.
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:44 2011 | http://epydoc.sourceforge.net |