Package nltk :: Package corpus :: Package reader :: Module wordnet
[hide private]
[frames] | no frames]

Module wordnet

source code

Classes [hide private]
WordNetError
An exception class for wordnet-related errors.
_WordNetObject
A common base class for lemmas and synsets.
Lemma
The lexical entry for a single morphological form of a sense-disambiguated word.
Synset
Create a Synset from a "<lemma>.<pos>.<number>" string where: <lemma> is the word's morphological stem <pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB <number> is the sense number, counting from 0.
WordNetCorpusReader
A corpus reader used to access wordnet or its variants.
WordNetICCorpusReader
A corpus reader for the WordNet information content corpus.
Functions [hide private]
 
path_similarity(synset1, synset2, verbose=False, simulate_root=True)
Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.
source code
 
lch_similarity(synset1, synset2, verbose=False, simulate_root=True)
Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur.
source code
 
wup_similarity(synset1, synset2, verbose=False, simulate_root=True)
Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
source code
 
res_similarity(synset1, synset2, ic, verbose=False)
Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).
source code
 
jcn_similarity(synset1, synset2, ic, verbose=False)
Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
source code
 
lin_similarity(synset1, synset2, ic, verbose=False)
Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
source code
 
_lcs_by_depth(synset1, synset2, verbose=False)
Finds the least common subsumer of two synsets in a WordNet taxonomy, where the least common subsumer is defined as the ancestor node common to both input synsets whose shortest path to the root node is the longest.
source code
 
_lcs_ic(synset1, synset2, ic, verbose=False)
Get the information content of the least common subsumer that has the highest information content value.
source code
 
information_content(synset, ic) source code
 
_get_pos(field) source code
 
demo() source code
Variables [hide private]
  _INF = 1e+300
Positive infinity (for similarity functions)
  POS_LIST = ['n', 'v', 'a', 'r']
  VERB_FRAME_STRINGS = (None, 'Something %s', 'Somebody %s', 'It...
A table of strings that are used to express verb frames.
  ADJ = 'a'
  ADJ_SAT = 's'
  ADV = 'r'
  NOUN = 'n'
  VERB = 'v'
Function Details [hide private]

path_similarity(synset1, synset2, verbose=False, simulate_root=True)

source code 

Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, except in those cases where a path cannot be found (will only be true for verbs as there are many distinct verb taxonomies), in which case None is returned. A score of 1 represents identity i.e. comparing a sense with itself will return 1.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A score denoting the similarity of the two Synsets, normally between 0 and 1. None is returned if no connecting path could be found. 1 is returned if a Synset is compared with itself.

lch_similarity(synset1, synset2, verbose=False, simulate_root=True)

source code 

Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d is the taxonomy depth.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A score denoting the similarity of the two Synsets, normally greater than 0. None is returned if no connecting path could be found. If a Synset is compared with itself, the maximum score is returned, which varies depending on the taxonomy depth.

wup_similarity(synset1, synset2, verbose=False, simulate_root=True)

source code 

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Previously, the scores computed by this implementation did _not_ always agree with those given by Pedersen's Perl implementation of WordNet Similarity. However, with the addition of the simulate_root flag (see below), the score for verbs now almost always agree but not always for nouns.

The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by definition the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A float score denoting the similarity of the two Synsets, normally greater than zero. If no connecting path between the two senses can be found, None is returned.

res_similarity(synset1, synset2, ic, verbose=False)

source code 

Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets. Synsets whose LCS is the root node of the taxonomy will have a score of 0 (e.g. N['dog'][0] and N['table'][0]).

jcn_similarity(synset1, synset2, ic, verbose=False)

source code 

Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets.

lin_similarity(synset1, synset2, ic, verbose=False)

source code 

Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets, in the range 0 to 1.

_lcs_by_depth(synset1, synset2, verbose=False)

source code 

Finds the least common subsumer of two synsets in a WordNet taxonomy, where the least common subsumer is defined as the ancestor node common to both input synsets whose shortest path to the root node is the longest.

Parameters:
  • synset1 (Synset) - First input synset.
  • synset2 (Synset) - Second input synset.
Returns:
The ancestor synset common to both input synsets which is also the LCS.

_lcs_ic(synset1, synset2, ic, verbose=False)

source code 

Get the information content of the least common subsumer that has the highest information content value. If two nodes have no explicit common subsumer, assume that they share an artificial root node that is the hypernym of all explicit roots.

Parameters:
  • synset1 (Synset) - First input synset.
  • synset2 (Synset) - Second input synset. Must be the same part of speech as the first synset.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
The information content of the two synsets and their most informative subsumer

Variables Details [hide private]

VERB_FRAME_STRINGS

A table of strings that are used to express verb frames.

Value:
(None,
 'Something %s',
 'Somebody %s',
 'It is %sing',
 'Something is %sing PP',
 'Something %s something Adjective/Noun',
 'Something %s Adjective/Noun',
 'Somebody %s Adjective',
...