Package nltk :: Package corpus :: Package reader :: Module nombank :: Class NombankCorpusReader
[hide private]
[frames] | no frames]

type NombankCorpusReader

source code

      object --+    
               |    
api.CorpusReader --+
                   |
                  NombankCorpusReader

Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of frameset files which define the argument labels used by the annotations, on a per-noun basis. Each frameset file contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called rolesets. For each roleset, the frameset file provides descriptions of the argument roles, along with examples.

Instance Methods [hide private]
 
__init__(self, root, nomfile, framefiles='', nounsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding=None) source code
 
raw(self, fileids=None)
Returns: the text contents of the given fileids, as a single string.
source code
 
instances(self)
Returns: a corpus view that acts as a list of NombankInstance objects, one for each noun in the corpus.
source code
 
lines(self)
Returns: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.
source code
 
roleset(self, roleset_id)
Returns: the xml description for the given roleset.
source code
 
nouns(self)
Returns: a corpus view that acts as a list of all noun lemmas in this corpus (from the nombank.1.0.words file).
source code
 
_read_instance_block(self, stream) source code

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, fileids, open, readme

Inherited from api.CorpusReader (private): _get_root

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

__init__(self, root, nomfile, framefiles='', nounsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding=None)
(Constructor)

source code 
Parameters:
  • root - The root directory for this corpus.
  • nomfile - The name of the file containing the predicate- argument annotations (relative to root).
  • framefiles - A list or regexp specifying the frameset fileids for this corpus.
  • parse_fileid_xform - A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).
  • parse_corpus - The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by nombank.
Overrides: api.CorpusReader.__init__

raw(self, fileids=None)

source code 
Returns:
the text contents of the given fileids, as a single string.

instances(self)

source code 
Returns:
a corpus view that acts as a list of NombankInstance objects, one for each noun in the corpus.

lines(self)

source code 
Returns:
a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

roleset(self, roleset_id)

source code 
Returns:
the xml description for the given roleset.

nouns(self)

source code 
Returns:
a corpus view that acts as a list of all noun lemmas in this corpus (from the nombank.1.0.words file).