Package nltk :: Package corpus :: Package reader :: Module wordnet :: Class Synset
[hide private]
[frames] | no frames]

type Synset

source code

    object --+    
             |    
_WordNetObject --+
                 |
                Synset

Create a Synset from a "<lemma>.<pos>.<number>" string where:
<lemma> is the word's morphological stem
<pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
<number> is the sense number, counting from 0.

Synset attributes
-----------------
name - The canonical name of this synset, formed using the first lemma
    of this synset. Note that this may be different from the name
    passed to the constructor if that string used a different lemma to
    identify the synset.
pos - The synset's part of speech, matching one of the module level
    attributes ADJ, ADJ_SAT, ADV, NOUN or VERB.
lemmas - A list of the Lemma objects for this synset.
definition - The definition for this synset.
examples - A list of example strings for this synset.
offset - The offset in the WordNet dict file of this synset.
#lexname - The name of the lexicographer file containing this synset.

Synset methods
--------------
Synsets have the following methods for retrieving related Synsets.
They correspond to the names for the pointer symbols defined here:
    http://wordnet.princeton.edu/man/wninput.5WN.html#sect3
These methods all return lists of Synsets.

hypernyms
instance_hypernyms
hyponyms
instance_hyponyms
member_holonyms
substance_holonyms
part_holonyms
member_meronyms
substance_meronyms
part_meronyms
attributes
entailments
causes
also_sees
verb_groups
similar_tos

Additionally, Synsets support the following methods specific to the
hypernym relation:

root_hypernyms
common_hypernyms
lowest_common_hypernyms

Note that Synsets do not support the following relations because
these are defined by WordNet as lexical relations:

antonyms
derivationally_related_forms
pertainyms

Instance Methods [hide private]
 
__init__(self, wordnet_corpus_reader) source code
 
_needs_root(self) source code
 
root_hypernyms(self)
Get the topmost hypernyms of this synset in WordNet.
source code
 
max_depth(self)
Returns: The length of the longest hypernym path from this synset to the root.
source code
 
min_depth(self)
Returns: The length of the shortest hypernym path from this synset to the root.
source code
 
closure(self, rel, depth=-1)
Return the transitive closure of source under the rel relationship, breadth-first
source code
 
hypernym_paths(self)
Get the path(s) from this synset to the root, where each path is a list of the synset nodes traversed on the way to the root.
source code
 
common_hypernyms(self, other)
Find all synsets that are hypernyms of this synset and the other synset.
source code
 
lowest_common_hypernyms(self, other, simulate_root=False)
Get the lowest synset that both synsets have as a hypernym.
source code
 
hypernym_distances(self, distance=0, simulate_root=False)
Get the path(s) from this synset to the root, counting the distance of each node from the initial node on the way.
source code
 
shortest_path_distance(self, other, simulate_root=False)
Returns the distance of the shortest path linking the two synsets (if one exists).
source code
 
tree(self, rel, depth=-1, cut_mark=None) source code
 
path_similarity(self, other, verbose=False, simulate_root=True)
Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.
source code
 
lch_similarity(self, other, verbose=False, simulate_root=True)
Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur.
source code
 
wup_similarity(self, other, verbose=False, simulate_root=True)
Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
source code
 
res_similarity(self, other, ic, verbose=False)
Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).
source code
 
jcn_similarity(self, other, ic, verbose=False)
Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
source code
 
lin_similarity(self, other, ic, verbose=False)
Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
source code
 
_iter_hypernym_lists(self)
Returns: An iterator over Synsets that are either proper hypernyms or instance of hypernyms of the synset.
source code
 
__repr__(self) source code
 
_related(self, relation_symbol) source code

Inherited from _WordNetObject: __eq__, __hash__, __ne__, also_sees, attributes, causes, entailments, hypernyms, hyponyms, instance_hypernyms, instance_hyponyms, member_holonyms, member_meronyms, part_holonyms, part_meronyms, region_domains, similar_tos, substance_holonyms, substance_meronyms, topic_domains, usage_domains, verb_groups

Method Details [hide private]

__init__(self, wordnet_corpus_reader)
(Constructor)

source code 
Overrides: object.__init__
(inherited documentation)

max_depth(self)

source code 
Returns:
The length of the longest hypernym path from this synset to the root.

min_depth(self)

source code 
Returns:
The length of the shortest hypernym path from this synset to the root.

closure(self, rel, depth=-1)

source code 

Return the transitive closure of source under the rel relationship, breadth-first

>>> from nltk.corpus import wordnet as wn
>>> dog = wn.synset('dog.n.01')
>>> hyp = lambda s:s.hypernyms()
>>> list(dog.closure(hyp))
[Synset('domestic_animal.n.01'), Synset('canine.n.02'),
Synset('animal.n.01'), Synset('carnivore.n.01'),
Synset('organism.n.01'), Synset('placental.n.01'),
Synset('living_thing.n.01'), Synset('mammal.n.01'),
Synset('whole.n.02'), Synset('vertebrate.n.01'),
Synset('object.n.01'), Synset('chordate.n.01'),
Synset('physical_entity.n.01'), Synset('entity.n.01')]

hypernym_paths(self)

source code 

Get the path(s) from this synset to the root, where each path is a list of the synset nodes traversed on the way to the root.

Returns:
A list of lists, where each list gives the node sequence connecting the initial Synset node and a root node.

common_hypernyms(self, other)

source code 

Find all synsets that are hypernyms of this synset and the other synset.

Parameters:
  • other (Synset) - other input synset.
Returns:
The synsets that are hypernyms of both synsets.

hypernym_distances(self, distance=0, simulate_root=False)

source code 

Get the path(s) from this synset to the root, counting the distance of each node from the initial node on the way. A set of (synset, distance) tuples is returned.

Parameters:
  • distance (int) - the distance (number of edges) from this hypernym to the original hypernym Synset on which this method was called.
Returns:
A set of (Synset, int) tuples where each Synset is a hypernym of the first Synset.

shortest_path_distance(self, other, simulate_root=False)

source code 

Returns the distance of the shortest path linking the two synsets (if one exists). For each synset, all the ancestor nodes and their distances are recorded and compared. The ancestor node common to both synsets that can be reached with the minimum number of traversals is used. If no ancestor nodes are common, None is returned. If a node is compared with itself 0 is returned.

Parameters:
  • other (Synset) - The Synset to which the shortest path will be found.
Returns:
The number of edges in the shortest path connecting the two nodes, or None if no path exists.

path_similarity(self, other, verbose=False, simulate_root=True)

source code 

Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, except in those cases where a path cannot be found (will only be true for verbs as there are many distinct verb taxonomies), in which case None is returned. A score of 1 represents identity i.e. comparing a sense with itself will return 1.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A score denoting the similarity of the two Synsets, normally between 0 and 1. None is returned if no connecting path could be found. 1 is returned if a Synset is compared with itself.

lch_similarity(self, other, verbose=False, simulate_root=True)

source code 

Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d is the taxonomy depth.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A score denoting the similarity of the two Synsets, normally greater than 0. None is returned if no connecting path could be found. If a Synset is compared with itself, the maximum score is returned, which varies depending on the taxonomy depth.

wup_similarity(self, other, verbose=False, simulate_root=True)

source code 

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Previously, the scores computed by this implementation did _not_ always agree with those given by Pedersen's Perl implementation of WordNet Similarity. However, with the addition of the simulate_root flag (see below), the score for verbs now almost always agree but not always for nouns.

The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by definition the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation.

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • simulate_root (bool) - The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns:
A float score denoting the similarity of the two Synsets, normally greater than zero. If no connecting path between the two senses can be found, None is returned.

res_similarity(self, other, ic, verbose=False)

source code 

Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets. Synsets whose LCS is the root node of the taxonomy will have a score of 0 (e.g. N['dog'][0] and N['table'][0]).

jcn_similarity(self, other, ic, verbose=False)

source code 

Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets.

lin_similarity(self, other, ic, verbose=False)

source code 

Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).

Parameters:
  • other (Synset) - The Synset that this Synset is being compared to.
  • ic (dict) - an information content object (as returned by load_ic()).
Returns:
A float score denoting the similarity of the two Synsets, in the range 0 to 1.

_iter_hypernym_lists(self)

source code 
Returns:
An iterator over Synsets that are either proper hypernyms or instance of hypernyms of the synset.

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)