Package nltk :: Module probability :: Class FreqDist
[hide private]
[frames] | no frames]

type FreqDist

source code

object --+    
         |    
      dict --+
             |
            FreqDist

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> fdist = FreqDist()
>>> for word in tokenize.whitespace(sent):
...    fdist.inc(word.lower())

An equivalent way to do this is with the initializer:

>>> fdist = FreqDist(word.lower() for word in tokenize.whitespace(sent))
Instance Methods [hide private]
new empty dictionary

__init__(self, samples=None)
Construct a new frequency distribution.
source code
None
inc(self, sample, count=1)
Increment this FreqDist's count for the given sample.
source code
None
__setitem__(self, sample, value)
Set this FreqDist's count for the given sample.
source code
int
N(self)
Returns: The total number of sample outcomes that have been recorded by this FreqDist.
source code
int
B(self)
Returns: The total number of sample values (or bins) that have counts greater than zero.
source code
list
samples(self)
Returns: A list of all samples that have been recorded as outcomes by this frequency distribution.
source code
list
hapaxes(self)
Returns: A list of all samples that occur once (hapax legomena)
source code
int
Nr(self, r, bins=None)
Returns: The number of samples with count r.
source code
 
_cache_Nr_values(self) source code
int
count(self, sample)
Return the count of a given sample.
source code
list of float
_cumulative_frequencies(self, samples=None)
Return the cumulative frequencies of the specified samples.
source code
float
freq(self, sample)
Return the frequency of a given sample.
source code
any or None
max(self)
Return the sample with the greatest number of outcomes in this frequency distribution.
source code
 
plot(self, *args, **kwargs)
Plot samples from the frequency distribution displaying the most frequent sample first.
source code
 
tabulate(self, *args, **kwargs)
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first.
source code
 
sorted_samples(self) source code
 
sorted(self) source code
 
_sort_keys_by_value(self) source code
list of any
keys(self)
Return the samples sorted in decreasing order of frequency.
source code
list of any
values(self)
Return the samples sorted in decreasing order of frequency.
source code
list of tuple
items(self)
Return the items sorted in decreasing order of frequency.
source code
iter
__iter__(self)
Return the samples sorted in decreasing order of frequency.
source code
iter
iterkeys(self)
Return the samples sorted in decreasing order of frequency.
source code
iter
itervalues(self)
Return the values sorted in decreasing order.
source code
iter of any
iteritems(self)
Return the items sorted in decreasing order of frequency.
source code
FreqDist
copy(self)
Create a copy of this frequency distribution.
source code
None
update(self, samples)
Update the frequency distribution with the provided list of samples.
source code
v, remove specified key and return the corresponding value
pop(self, other)
If key is not found, d is returned if given, otherwise KeyError is raised
source code
(k, v), remove and return some (key, value) pair as a
popitem(self, other)
2-tuple; but raise KeyError if D is empty
source code
None
clear(self)
Remove all items from D.
source code
 
_reset_caches(self) source code
 
__add__(self, other) source code
 
__eq__(self, other)
x==y
source code
 
__ne__(self, other)
x!=y
source code
 
__le__(self, other)
x<=y
source code
 
__lt__(self, other)
x<y
source code
 
__ge__(self, other)
x>=y
source code
 
__gt__(self, other)
x>y
source code
string
__repr__(self)
Returns: A string representation of this FreqDist.
source code
string
__str__(self)
Returns: A string representation of this FreqDist.
source code
 
__getitem__(self, sample)
x[y]
source code

Inherited from dict: __cmp__, __contains__, __delitem__, __getattribute__, __hash__, __len__, __new__, fromkeys, get, has_key, setdefault

Method Details [hide private]

__init__(self, samples=None)
(Constructor)

source code 

Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.

In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls update with the list samples.

Parameters:
  • samples (Sequence) - The samples to initialize the frequency distribution with.
Returns:
new empty dictionary

Overrides: dict.__init__

inc(self, sample, count=1)

source code 

Increment this FreqDist's count for the given sample.

Parameters:
  • sample (any) - The sample whose count should be incremented.
  • count (int) - The amount to increment the sample's count by.
Returns: None
Raises:
  • NotImplementedError - If sample is not a supported sample type.

__setitem__(self, sample, value)
(Index assignment operator)

source code 

Set this FreqDist's count for the given sample.

Parameters:
  • sample (any hashable object) - The sample whose count should be incremented.
  • count (int) - The new value for the sample's count
Returns: None
Raises:
  • TypeError - If sample is not a supported sample type.
Overrides: dict.__setitem__

N(self)

source code 
Returns: int
The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().

B(self)

source code 
Returns: int
The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)

samples(self)

source code 
Returns: list
A list of all samples that have been recorded as outcomes by this frequency distribution. Use count() to determine the count for each sample.

hapaxes(self)

source code 
Returns: list
A list of all samples that occur once (hapax legomena)

Nr(self, r, bins=None)

source code 
Parameters:
  • r (int) - A sample count.
  • bins (int) - The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
Returns: int
The number of samples with count r.

count(self, sample)

source code 

Return the count of a given sample. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Counts are non-negative integers. This method has been replaced by conventional dictionary indexing; use fd[item] instead of fd.count(item).

Parameters:
  • sample (any.) - the sample whose count should be returned.
Returns: int
The count of a given sample.

_cumulative_frequencies(self, samples=None)

source code 

Return the cumulative frequencies of the specified samples. If no samples are specified, all counts are returned, starting with the largest.

Parameters:
  • samples - the samples whose frequencies should be returned.
  • sample (any.)
Returns: list of float
The cumulative frequencies of the given samples.

freq(self, sample)

source code 

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

Parameters:
  • sample (any) - the sample whose frequency should be returned.
Returns: float
The frequency of a given sample.

max(self)

source code 

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occurred in this frequency distribution, return None.

Returns: any or None
The sample with the maximum number of outcomes in this frequency distribution.

plot(self, *args, **kwargs)

source code 

Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. If two integer parameters m, n are supplied, plot a subset of the samples, beginning with m and stopping at n-1. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)

Parameters:
  • title (str) - The title for the graph
  • cumulative - A flag to specify whether the plot is cumulative (default = False)

tabulate(self, *args, **kwargs)

source code 

Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. If two integer parameters m, n are supplied, plot a subset of the samples, beginning with m and stopping at n-1. (Requires Matplotlib to be installed.)

Parameters:
  • samples (list) - The samples to plot (default is all samples)

keys(self)

source code 

Return the samples sorted in decreasing order of frequency.

Returns: list of any
A list of samples, in sorted order
Overrides: dict.keys

values(self)

source code 

Return the samples sorted in decreasing order of frequency.

Returns: list of any
A list of samples, in sorted order
Overrides: dict.values

items(self)

source code 

Return the items sorted in decreasing order of frequency.

Returns: list of tuple
A list of items, in sorted order
Overrides: dict.items

__iter__(self)

source code 

Return the samples sorted in decreasing order of frequency.

Returns: iter
An iterator over the samples, in sorted order
Overrides: dict.__iter__

iterkeys(self)

source code 

Return the samples sorted in decreasing order of frequency.

Returns: iter
An iterator over the samples, in sorted order
Overrides: dict.iterkeys

itervalues(self)

source code 

Return the values sorted in decreasing order.

Returns: iter
An iterator over the values, in sorted order
Overrides: dict.itervalues

iteritems(self)

source code 

Return the items sorted in decreasing order of frequency.

Returns: iter of any
An iterator over the items, in sorted order
Overrides: dict.iteritems

copy(self)

source code 

Create a copy of this frequency distribution.

Returns: FreqDist
A copy of this frequency distribution object.
Overrides: dict.copy

update(self, samples)

source code 

Update the frequency distribution with the provided list of samples. This is a faster way to add multiple samples to the distribution.

Parameters:
  • samples (list) - The samples to add.
Returns: None
Overrides: dict.update

pop(self, other)

source code 

If key is not found, d is returned if given, otherwise KeyError is raised

Returns: v, remove specified key and return the corresponding value
Overrides: dict.pop
(inherited documentation)

popitem(self, other)

source code 

2-tuple; but raise KeyError if D is empty

Returns: (k, v), remove and return some (key, value) pair as a
Overrides: dict.popitem
(inherited documentation)

clear(self)

source code 

Remove all items from D.

Returns: None
Overrides: dict.clear
(inherited documentation)

__eq__(self, other)
(Equality operator)

source code 

x==y

Overrides: dict.__eq__
(inherited documentation)

__ne__(self, other)

source code 

x!=y

Overrides: dict.__ne__
(inherited documentation)

__le__(self, other)
(Less-than-or-equals operator)

source code 

x<=y

Overrides: dict.__le__
(inherited documentation)

__lt__(self, other)
(Less-than operator)

source code 

x<y

Overrides: dict.__lt__
(inherited documentation)

__ge__(self, other)
(Greater-than-or-equals operator)

source code 

x>=y

Overrides: dict.__ge__
(inherited documentation)

__gt__(self, other)
(Greater-than operator)

source code 

x>y

Overrides: dict.__gt__
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
A string representation of this FreqDist.
Overrides: dict.__repr__

__str__(self)
(Informal representation operator)

source code 
Returns: string
A string representation of this FreqDist.
Overrides: object.__str__

__getitem__(self, sample)
(Indexing operator)

source code 

x[y]

Overrides: dict.__getitem__
(inherited documentation)