Corpus reader for the nombank corpus, which augments the Penn Treebank
with information about the predicate argument structure of every noun
instance. The corpus consists of two parts: the predicate-argument
annotations themselves, and a set of frameset files
which define the argument labels used by the annotations, on a per-noun
basis. Each frameset file contains one or more predicates, such
as 'turn' or 'turn_on', each of which is
divided into coarse-grained word senses called rolesets. For each roleset, the frameset
file provides descriptions of the argument roles, along with
examples.
|
|
__init__(self,
root,
nomfile,
framefiles='',
nounsfile=None,
parse_fileid_xform=None,
parse_corpus=None,
encoding=None) |
source code
|
|
|
|
raw(self,
fileids=None)
Returns:
the text contents of the given fileids, as a single string. |
source code
|
|
|
|
|
|
|
lines(self)
Returns:
a corpus view that acts as a list of strings, one for each line in
the predicate-argument annotation file. |
source code
|
|
|
|
|
|
|
nouns(self)
Returns:
a corpus view that acts as a list of all noun lemmas in this corpus
(from the nombank.1.0.words file). |
source code
|
|
|
|
|
|
Inherited from api.CorpusReader:
__repr__,
abspath,
abspaths,
encoding,
fileids,
open,
readme
|
|
Inherited from api.CorpusReader:
files
|
|
Inherited from api.CorpusReader:
items
|