A corpus reader used to access wordnet or its variants.
|
|
__init__(self,
root)
Construct a new wordnet corpus reader, with the given root directory. |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_data_file(self,
pos)
Return an open file pointer for the data file for the given part of
speech. |
source code
|
|
|
|
| _synset_from_pos_and_offset(self,
pos,
offset) |
source code
|
|
|
|
| _synset_from_pos_and_line(self,
pos,
data_file_line) |
source code
|
|
|
|
synsets(self,
lemma,
pos=None)
Load all synsets with a given lemma and part of speech tag. |
source code
|
|
|
|
lemmas(self,
lemma,
pos=None)
Return all Lemma objects with a name matching the specified lemma
name and part of speech tag. |
source code
|
|
|
|
|
|
|
|
|
|
lemma_count(self,
lemma)
Return the frequency count for this Lemma |
source code
|
|
|
|
path_similarity(self,
synset1,
synset2,
verbose=False,
simulate_root=True)
Path Distance Similarity: Return a score denoting how similar two
word senses are, based on the shortest path that connects the senses
in the is-a (hypernym/hypnoym) taxonomy. |
source code
|
|
|
|
lch_similarity(self,
synset1,
synset2,
verbose=False,
simulate_root=True)
Leacock Chodorow Similarity: Return a score denoting how similar two
word senses are, based on the shortest path that connects the senses
(as above) and the maximum depth of the taxonomy in which the senses
occur. |
source code
|
|
|
|
wup_similarity(self,
synset1,
synset2,
verbose=False,
simulate_root=True)
Wu-Palmer Similarity: Return a score denoting how similar two word
senses are, based on the depth of the two senses in the taxonomy and
that of their Least Common Subsumer (most specific ancestor node). |
source code
|
|
|
|
res_similarity(self,
synset1,
synset2,
ic,
verbose=False)
Resnik Similarity: Return a score denoting how similar two word
senses are, based on the Information Content (IC) of the Least Common
Subsumer (most specific ancestor node). |
source code
|
|
|
|
jcn_similarity(self,
synset1,
synset2,
ic,
verbose=False)
Jiang-Conrath Similarity: Return a score denoting how similar two
word senses are, based on the Information Content (IC) of the Least
Common Subsumer (most specific ancestor node) and that of the two
input Synsets. |
source code
|
|
|
|
lin_similarity(self,
synset1,
synset2,
ic,
verbose=False)
Lin Similarity: Return a score denoting how similar two word senses
are, based on the Information Content (IC) of the Least Common
Subsumer (most specific ancestor node) and that of the two input
Synsets. |
source code
|
|
|
|
morphy(self,
form,
pos=None)
Find a possible base form for the given form, with the given part of
speech, by checking WordNet's list of exceptional forms, and by
recursively stripping affixes for this part of speech until a form in
WordNet is found. |
source code
|
|
|
|
|
|
|
ic(self,
corpus,
weight_senses_equally=False,
smoothing=1.0)
Creates an information content lookup dictionary from a corpus. |
source code
|
|
|
Inherited from api.CorpusReader:
__repr__,
abspath,
abspaths,
encoding,
fileids,
open,
readme
|
|
Inherited from api.CorpusReader:
files
|
|
Inherited from api.CorpusReader:
items
|
|
|
_ENCODING = None
|
|
|
_FILES = ('cntlist.rev', 'lexnames', 'index.sense', 'index.adj...
A list of file identifiers for all the fileids used by this corpus
reader.
|
|
|
MORPHOLOGICAL_SUBSTITUTIONS = {'a': [('er', ''), ('est', ''), ...
|
|
|
ADJ = 'a'
|
|
|
ADJ_SAT = 's'
|
|
|
ADV = 'r'
|
|
|
NOUN = 'n'
|
|
|
VERB = 'v'
|
|
|
_FILEMAP = {'a': 'adj', 'n': 'noun', 'r': 'adv', 'v': 'verb'}
|
|
|
_pos_numbers = {'a': 3, 'n': 1, 'r': 4, 's': 5, 'v': 2}
|
|
|
_pos_names = {1: 'n', 2: 'v', 3: 'a', 4: 'r', 5: 's'}
|