| Home | Trees | Indices | Help |
|
|---|
|
|
NLTK corpus readers. The modules in this package provide functions that can be used to read corpus fileids in a variety of formats. These functions can be used to read both the corpus fileids that are distributed in the NLTK corpus package, and corpus fileids that are part of external corpora.
Each corpus module defines one or more corpus
reader functions, which can be used to read documents from that
corpus. These functions take an argument, item, which is
used to indicate which document should be read from the corpus:
item is one of the unique identifiers listed in the
corpus module's items variable, then the corresponding
document will be loaded from the NLTK corpus package.
item is a fileid, then that file will be read.
Additionally, corpus reader functions can be given lists of item names; in which case, they will return a concatenation of the corresponding documents.
Corpus reader functions are named based on the type of information they return. Some common examples, and their return types, are:
For example, to read a list of the words in the Brown Corpus, use
nltk.corpus.brown.words():
>>> from nltk.corpus import brown >>> print brown.words() ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
[Work in Progress: Corpus Metadata =============== Metadata about
the NLTK corpora, and their individual documents, is stored using Open Language
Archives Community (OLAC) metadata records. These records can be
accessed using nltk.corpus.corpus.olac().]
|
|||
| |||
|
|||
|
CategorizedPlaintextCorpusReader A reader for plaintext corpora whose documents are divided into categories based on their file identifiers. |
|||
|
PlaintextCorpusReader Reader for corpora that consist of plaintext documents. |
|||
|
CategorizedTaggedCorpusReader A reader for part-of-speech tagged corpora whose documents are divided into categories based on their file identifiers. |
|||
|
MacMorphoCorpusReader A corpus reader for the MAC_MORPHO corpus. |
|||
| CMUDictCorpusReader | |||
|
ConllChunkCorpusReader A ConllCorpusReader whose data file contains three columns: words, pos, and chunk. |
|||
|
ConllCorpusReader A corpus reader for CoNLL-style files. |
|||
|
ChunkedCorpusReader Reader for chunked (and optionally tagged) corpora. |
|||
| SwadeshCorpusReader | |||
|
WordListCorpusReader List of words, one per line. |
|||
|
PPAttachmentCorpusReader sentence_id verb noun1 preposition noun2 attachment |
|||
| SensevalCorpusReader | |||
| IEERCorpusReader | |||
|
SinicaTreebankCorpusReader Reader for the sinica treebank. |
|||
|
AlpinoCorpusReader Reader for the Alpino Dutch Treebank. |
|||
|
IndianCorpusReader List of words, one per line. |
|||
| ToolboxCorpusReader | |||
|
TimitCorpusReader Reader for the TIMIT corpus (or any other corpus with the same file layout and use of file formats). |
|||
|
TaggedCorpusReader Reader for simple part-of-speech tagged corpora. |
|||
|
BracketParseCorpusReader Reader for corpora that consist of parenthesis-delineated parse trees. |
|||
|
YCOECorpusReader Corpus reader for the York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE), a 1.5 million word syntactically-annotated corpus of Old English prose texts. |
|||
|
RTECorpusReader Corpus reader for corpora in RTE challenges. |
|||
| StringCategoryCorpusReader | |||
|
PropbankCorpusReader Corpus reader for the propbank corpus, which augments the Penn Treebank with information about the predicate argument structure of every verb instance. |
|||
| VerbnetCorpusReader | |||
|
BNCCorpusReader Corpus reader for the XML version of the British National Corpus. |
|||
| NPSChatCorpusReader | |||
|
XMLCorpusReader Corpus reader for corpora whose documents are xml files. |
|||
|
WordNetICCorpusReader A corpus reader for the WordNet information content corpus. |
|||
|
WordNetCorpusReader A corpus reader used to access wordnet or its variants. |
|||
| SwitchboardCorpusReader | |||
| DependencyCorpusReader | |||
|
SyntaxCorpusReader An abstract base class for reading corpora consisting of syntactically parsed text. |
|||
|
CategorizedCorpusReader A mixin class used to aid in the implementation of corpus readers for categorized corpora. |
|||
|
NombankCorpusReader Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. |
|||
|
CorpusReader A base class for corpus reader classes, each of which can be used to read a specific corpus format. |
|||
|
AlignedCorpusReader Reader for corpora of word-aligned sentences. |
|||
|
CHILDESCorpusReader Corpus reader for the XML version of the CHILDES corpus. |
|||
| ChasenCorpusReader | |||
|
EuroparlCorpusReader Reader for Europarl corpora that consist of plaintext documents. |
|||
|
IPIPANCorpusReader Corpus reader designed to work with corpus created by IPI PAN. |
|||
|
KNBCorpusReader This class implements: - L{__init__}, which specifies the location of the corpus and a method for detecting the sentence blocks in corpus files. |
|||
| Pl196xCorpusReader | |||
| PortugueseCategorizedPlaintextCorpusReader | |||
| TEICorpusView | |||
|
TimitTaggedCorpusReader A corpus reader for tagged sentences that are included in the TIMIT corpus. |
|||
|
|||
|
|||
|
|||
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:41 2011 | http://epydoc.sourceforge.net |