Package nltk :: Package classify :: Module naivebayes :: Class NaiveBayesClassifier
[hide private]
[frames] | no frames]

type NaiveBayesClassifier

source code

     object --+    
              |    
api.ClassifierI --+
                  |
                 NaiveBayesClassifier

A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions:

If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature.

The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features.

Instance Methods [hide private]
 
__init__(self, label_probdist, feature_probdist) source code
list of (immutable)
labels(self)
Returns: the list of category labels used by this classifier.
source code
label
classify(self, featureset)
Returns: the most appropriate label for the given featureset.
source code
ProbDistI
prob_classify(self, featureset)
Returns: a probability distribution over labels for the given featureset.
source code
 
show_most_informative_features(self, n=10) source code
 
most_informative_features(self, n=100)
Return a list of the 'most informative' features used by this classifier.
source code

Inherited from api.ClassifierI: batch_classify, batch_prob_classify

    Deprecated

Inherited from api.ClassifierI: batch_probdist, probdist

Static Methods [hide private]
 
train(labeled_featuresets, estimator=<class 'nltk.probability.ELEProbDist'>) source code
Method Details [hide private]

__init__(self, label_probdist, feature_probdist)
(Constructor)

source code 
Parameters:
  • label_probdist - P(label), the probability distribution over labels. It is expressed as a ProbDistI whose samples are labels. I.e., P(label) = label_probdist.prob(label).
  • feature_probdist - P(fname=fval|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are (label,fname) pairs and whose values are ProbDistIs over feature values. I.e., P(fname=fval|label) = feature_probdist[label,fname].prob(fval). If a given (label,fname) is not a key in feature_probdist, then it is assumed that the corresponding P(fname=fval|label) is 0 for all values of fval.
Overrides: object.__init__

labels(self)

source code 
Returns: list of (immutable)
the list of category labels used by this classifier.
Overrides: api.ClassifierI.labels
(inherited documentation)

classify(self, featureset)

source code 
Returns: label
the most appropriate label for the given featureset.
Overrides: api.ClassifierI.classify
(inherited documentation)

prob_classify(self, featureset)

source code 
Returns: ProbDistI
a probability distribution over labels for the given featureset.
Overrides: api.ClassifierI.prob_classify
(inherited documentation)

most_informative_features(self, n=100)

source code 

Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label:

 max[ P(fname=fval|label1) / P(fname=fval|label2) ]

train(labeled_featuresets, estimator=<class 'nltk.probability.ELEProbDist'>)
Static Method

source code 
Parameters:
  • labeled_featuresets - A list of classified featuresets, i.e., a list of tuples (featureset, label).