A corpus reader for CoNLL-style files. These files consist of a
series of sentences, separated by blank lines. Each sentence is encoded
using a table (or grid) of values, where each line corresponds to
a single word, and each column corresponds to an annotation type. The
set of columns used by CoNLL-style files can vary from corpus to corpus;
the ConllCorpusReader constructor therefore takes an
argument, columntypes, which is used to specify the columns
that are used by a given corpus.
|
|
__init__(self,
root,
fileids,
columntypes,
chunk_types=None,
top_node='S',
pos_in_tree=False,
srl_includes_roleset=True,
encoding=None,
tree_class=<class 'nltk.tree.Tree'>,
tag_mapping_function=None) |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
| tagged_words(self,
fileids=None,
simplify_tags=False) |
source code
|
|
|
|
| tagged_sents(self,
fileids=None,
simplify_tags=False) |
source code
|
|
|
|
| chunked_words(self,
fileids=None,
chunk_types=None,
simplify_tags=False) |
source code
|
|
|
|
| chunked_sents(self,
fileids=None,
chunk_types=None,
simplify_tags=False) |
source code
|
|
|
|
| parsed_sents(self,
fileids=None,
pos_in_tree=None,
simplify_tags=False) |
source code
|
|
|
|
|
|
|
| srl_instances(self,
fileids=None,
pos_in_tree=None,
flatten=True) |
source code
|
|
list of tuple
|
iob_words(self,
fileids=None,
simplify_tags=False)
Returns:
a list of word/tag/IOB tuples |
source code
|
|
list of list
|
iob_sents(self,
fileids=None,
simplify_tags=False)
Returns:
a list of lists of word/tag/IOB tuples |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
| _get_tagged_words(self,
grid,
simplify_tags=False) |
source code
|
|
|
|
| _get_iob_words(self,
grid,
simplify_tags=False) |
source code
|
|
|
|
| _get_chunked_words(self,
grid,
chunk_types,
simplify_tags=False) |
source code
|
|
|
|
| _get_parsed_sent(self,
grid,
pos_in_tree,
simplify_tags=False) |
source code
|
|
|
|
_get_srl_spans(self,
grid)
list of list of (start, end), tag) tuples |
source code
|
|
|
|
| _get_srl_instances(self,
grid,
pos_in_tree) |
source code
|
|
|
|
|
|
Inherited from api.CorpusReader:
__repr__,
abspath,
abspaths,
encoding,
fileids,
open,
readme
|
|
Inherited from api.CorpusReader:
files
|
|
Inherited from api.CorpusReader:
items
|
|
|
WORDS = 'words'
column type for words
|
|
|
POS = 'pos'
column type for part-of-speech tags
|
|
|
TREE = 'tree'
column type for parse trees
|
|
|
CHUNK = 'chunk'
column type for chunk structures
|
|
|
NE = 'ne'
column type for named entities
|
|
|
SRL = 'srl'
column type for semantic role labels
|
|
|
IGNORE = 'ignore'
column type for column that should be ignored
|
|
|
COLUMN_TYPES = ('words', 'pos', 'tree', 'chunk', 'ne', 'srl', ...
A list of all column types supported by the conll corpus reader.
|