Package nltk :: Package corpus :: Package reader :: Module verbnet :: Class VerbnetCorpusReader
[hide private]
[frames] | no frames]

type VerbnetCorpusReader

source code

         object --+        
                  |        
   api.CorpusReader --+    
                      |    
xmldocs.XMLCorpusReader --+
                          |
                         VerbnetCorpusReader

Instance Methods [hide private]
 
__init__(self, root, fileids, wrap_etree=False) source code
 
lemmas(self, classid=None)
Return a list of all verb lemmas that appear in any class, or in the classid if specified.
source code
 
wordnetids(self, classid=None)
Return a list of all wordnet identifiers that appear in any class, or in classid if specified.
source code
 
classids(self, lemma=None, wordnetid=None, fileid=None, classid=None)
Return a list of the verbnet class identifiers.
source code
 
vnclass(self, fileid_or_classid)
Return an ElementTree containing the xml for the specified verbnet class.
source code
 
fileids(self, vnclass_ids=None)
Return a list of fileids that make up this corpus.
source code

Inherited from xmldocs.XMLCorpusReader: raw, words, xml

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open, readme

Inherited from api.CorpusReader (private): _get_root

    Index Initialization
 
_index(self)
Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus fileids.
source code
 
_index_helper(self, xmltree, fileid)
Helper for _index()
source code
 
_quick_index(self)
Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus fileids.
source code
    Identifier conversion
 
longid(self, shortid)
Given a short verbnet class identifier (eg '37.10'), map it to a long id (eg 'confess-37.10').
source code
 
shortid(self, longid)
Given a long verbnet class identifier (eg 'confess-37.10'), map it to a short id (eg '37.10').
source code
    Pretty Printing
 
pprint(self, vnclass)
Return a string containing a pretty-printed representation of the given verbnet class.
source code
 
pprint_subclasses(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's subclasses.
source code
 
pprint_members(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's member verbs.
source code
 
pprint_themroles(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's thematic roles.
source code
 
pprint_frame(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame.
source code
 
pprint_description(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame description.
source code
 
pprint_syntax(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame syntax.
source code
 
pprint_semantics(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame semantics.
source code
    Deprecated since 0.8

Inherited from xmldocs.XMLCorpusReader: read

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Class Variables [hide private]
  _LONGID_RE = re.compile(r'([^-\.]*)-([\d\+\.-]+)$')
Regular expression that matches (and decomposes) longids
  _SHORTID_RE = re.compile(r'[\d\+\.-]+$')
Regular expression that matches shortids
  _INDEX_RE = re.compile(r'<MEMBER name="\??([^"]+)" wn="([^"]*)...
Regular expression used by _index() to quickly scan the corpus for basic information.
Instance Variables [hide private]
  _lemma_to_class
A dictionary mapping from verb lemma strings to lists of verbnet class identifiers.
  _wordnet_to_class
A dictionary mapping from wordnet identifier strings to lists of verbnet class identifiers.
  _class_to_fileid
A dictionary mapping from class identifiers to corresponding file identifiers.

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

__init__(self, root, fileids, wrap_etree=False)
(Constructor)

source code 
Parameters:
  • root - A path pointer identifying the root directory for this corpus. If a string is specified, then it will be converted to a PathPointer automatically.
  • fileids - A list of the files that make up this corpus. This list can either be specified explicitly, as a list of strings; or implicitly, as a regular expression over file paths. The absolute path for each file will be constructed by joining the reader's root to each file name.
  • encoding - The default unicode encoding for the files that make up the corpus. encoding's value can be any of the following:
    • A string: encoding is the encoding name for all files.
    • A dictionary: encoding[file_id] is the encoding name for the file whose identifier is file_id. If file_id is not in encoding, then the file contents will be processed using non-unicode byte strings.
    • A list: encoding should be a list of (regexp, encoding) tuples. The encoding for a file whose identifier is file_id will be the encoding value for the first tuple whose regexp matches the file_id. If no tuple's regexp matches the file_id, the file contents will be processed using non-unicode byte strings.
    • None: the file contents of all files will be processed using non-unicode byte strings.
  • tag_mapping_function - A function for normalizing or simplifying the POS tags returned by the tagged_words() or tagged_sents() methods.
Overrides: api.CorpusReader.__init__
(inherited documentation)

classids(self, lemma=None, wordnetid=None, fileid=None, classid=None)

source code 

Return a list of the verbnet class identifiers. If a file identifier is specified, then return only the verbnet class identifiers for classes (and subclasses) defined by that file. If a lemma is specified, then return only verbnet class identifiers for classes that contain that lemma as a member. If a wordnetid is specified, then return only identifiers for classes that contain that wordnetid as a member. If a classid is specified, then return only identifiers for subclasses of the specified verbnet class.

vnclass(self, fileid_or_classid)

source code 

Return an ElementTree containing the xml for the specified verbnet class.

Parameters:
  • fileid_or_classid - An identifier specifying which class should be returned. Can be a file identifier (such as 'put-9.1.xml'), or a verbnet class identifier (such as 'put-9.1') or a short verbnet class identifier (such as '9.1').

fileids(self, vnclass_ids=None)

source code 

Return a list of fileids that make up this corpus. If vnclass_ids is specified, then return the fileids that make up the specified verbnet class(es).

Overrides: api.CorpusReader.fileids

_index(self)

source code 

Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus fileids. This is fast with cElementTree (<0.1 secs), but quite slow (>10 secs) with the python implementation of ElementTree.

_quick_index(self)

source code 

Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus fileids. This doesn't do proper xml parsing, but is good enough to find everything in the standard verbnet corpus -- and it runs about 30 times faster than xml parsing (with the python ElementTree; only 2-3 times faster with cElementTree).

longid(self, shortid)

source code 

Given a short verbnet class identifier (eg '37.10'), map it to a long id (eg 'confess-37.10'). If shortid is already a long id, then return it as-is

shortid(self, longid)

source code 

Given a long verbnet class identifier (eg 'confess-37.10'), map it to a short id (eg '37.10'). If longid is already a short id, then return it as-is.

pprint(self, vnclass)

source code 

Return a string containing a pretty-printed representation of the given verbnet class.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_subclasses(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's subclasses.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_members(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's member verbs.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_themroles(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's thematic roles.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_frame(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_description(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame description.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_syntax(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame syntax.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_semantics(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame semantics.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

Class Variable Details [hide private]

_INDEX_RE

Regular expression used by _index() to quickly scan the corpus for basic information.

Value:
re.compile(r'<MEMBER name="\??([^"]+)" wn="([^"]*)"[^>]+>|VNSUBCLASS I\
D="([^"]+)"/?>')

Instance Variable Details [hide private]

_class_to_fileid

A dictionary mapping from class identifiers to corresponding file identifiers. The keys of this dictionary provide a complete list of all classes and subclasses.