type GoodTuringProbDist
source code
object --+
|
ProbDistI --+
|
GoodTuringProbDist
The Good-Turing estimate of a probability distribution. This method
calculates the probability mass to assign to events with zero or low
counts based on the number of events with higher counts. It does so by
using the smoothed count c*:
-
c* = (c + 1) N(c + 1) / N(c) for c >= 1
-
things with frequency zero in training = N(1)
for c == 0
where c is the original count, N(i) is the number of event types observed with count i. We can think the count of unseen as the count of
frequency one. (see Jurafsky & Martin 2nd Edition, p101)
__init__(self,
freqdist,
bins=None)
(Constructor)
| source code
|
- Parameters:
freqdist (FreqDist) - The frequency counts upon which to base the estimation.
bins (Int) - The number of possible event types. This must be at least as
large as the number of bins in the freqdist. If
None, then it's assumed to be equal to that of the
freqdist
- Overrides:
ProbDistI.__init__
|
- Parameters:
sample - The sample whose probability should be returned.
- Returns: float
- the probability for a given sample. Probabilities are always
real numbers in the range [0, 1].
- Overrides:
ProbDistI.prob
- (inherited documentation)
|
- Returns: any
- the sample with the greatest probability. If two or more samples
have the same probability, return one of them; which sample is
returned is undefined.
- Overrides:
ProbDistI.max
- (inherited documentation)
|
- Returns:
list
- A list of all samples that have nonzero probabilities. Use
prob to find the probability of each sample.
- Overrides:
ProbDistI.samples
- (inherited documentation)
|
- Returns:
float
- The probability mass transferred from the seen samples to the
unseen samples.
- Overrides:
ProbDistI.discount
|
- Returns:
string
- A string representation of this
ProbDist.
- Overrides:
object.__repr__
|