Package nltk :: Package metrics :: Module distance
[hide private]
[frames] | no frames]

Module distance

source code


Distance Metrics.

Compute the distance between two items (usually strings).
As metrics, they must satisfy the following three requirements:

1. d(a, a) = 0
2. d(a, b) >= 0
3. d(a, c) <= d(a, b) + d(b, c)
 

Functions [hide private]
 
_edit_dist_init(len1, len2) source code
 
_edit_dist_step(lev, i, j, c1, c2) source code
 
edit_distance(s1, s2)
Calculate the Levenshtein edit-distance between two strings.
source code
 
binary_distance(label1, label2)
Simple equality test.
source code
 
jaccard_distance(label1, label2)
Distance metric comparing set-similarity.
source code
 
masi_distance(label1, label2)
Distance metric that takes into account partial agreement when multiple labels are assigned.
source code
 
interval_distance(label1, label2)
Krippendorff'1 interval distance metric
source code
 
presence(label)
Higher-order function to test presence of a given label
source code
 
fractional_presence(label) source code
 
custom_distance(file) source code
 
demo() source code
Function Details [hide private]

edit_distance(s1, s2)

source code 

Calculate the Levenshtein edit-distance between two strings. The edit distance is the number of characters that need to be substituted, inserted, or deleted, to transform s1 into s2. For example, transforming "rain" to "shine" requires three steps, consisting of two substitutions and one insertion: "rain" -> "sain" -> "shin" -> "shine". These operations could have been done in other orders, but at least three steps are needed.

Parameters:
  • s1 (string), s2 (string @rtype int) - The strings to be analysed

binary_distance(label1, label2)

source code 

Simple equality test.

0.0 if the labels are identical, 1.0 if they are different.

>>> binary_distance(1,1)
0.0
>>> binary_distance(1,3)
1.0

masi_distance(label1, label2)

source code 

Distance metric that takes into account partial agreement when multiple labels are assigned.

>>> masi_distance(set([1,2]),set([1,2,3,4]))
0.5

Passonneau 2005, Measuring Agreement on Set-Valued Items (MASI) for Semantic and Pragmatic Annotation.

interval_distance(label1, label2)

source code 

Krippendorff'1 interval distance metric

>>> interval_distance(1,10)
81

Krippendorff 1980, Content Analysis: An Introduction to its Methodology