Package nltk :: Package classify :: Module weka :: Class ARFF_Formatter
[hide private]
[frames] | no frames]

classobj_type ARFF_Formatter

source code

Converts featuresets and labeled featuresets to ARFF-formatted strings, appropriate for input into Weka.

Features and classes can be specified manually in the constructor, or may be determined from data using from_train.

Instance Methods [hide private]
 
__init__(self, labels, features) source code
 
format(self, tokens)
Returns a string representation of ARFF output for the given data.
source code
 
labels(self)
Returns the list of classes.
source code
 
write(self, outfile, tokens)
Writes ARFF data to a file for the given data.
source code
 
header_section(self)
Returns an ARFF header as a string.
source code
 
data_section(self, tokens, labeled=None)
Returns the ARFF data section for the given data.
source code
 
_fmt_arff_val(self, fval) source code
Static Methods [hide private]
 
from_train(tokens)
Constructs an ARFF_Formatter instance with class labels and feature types determined from the given data.
source code
Method Details [hide private]

__init__(self, labels, features)
(Constructor)

source code 
Parameters:
  • labels - A list of all class labels that can be generated.
  • features - A list of feature specifications, where each feature specification is a tuple (fname, ftype); and ftype is an ARFF type string such as NUMERIC or STRING.

from_train(tokens)
Static Method

source code 

Constructs an ARFF_Formatter instance with class labels and feature types determined from the given data. Handles boolean, numeric and string (note: not nominal) types.

data_section(self, tokens, labeled=None)

source code 

Returns the ARFF data section for the given data.

Parameters:
  • tokens - a list of featuresets (dicts) or labelled featuresets which are tuples (featureset, label).
  • labeled - Indicates whether the given tokens are labeled or not. If None, then the tokens will be assumed to be labeled if the first token's value is a tuple or list.