Package nltk :: Package corpus :: Package reader :: Module rte :: Class RTECorpusReader
[hide private]
[frames] | no frames]

type RTECorpusReader

source code

         object --+        
                  |        
   api.CorpusReader --+    
                      |    
xmldocs.XMLCorpusReader --+
                          |
                         RTECorpusReader

Corpus reader for corpora in RTE challenges.

This is just a wrapper around the XMLCorpusReader. See module docstring above for the expected structure of input documents.

Instance Methods [hide private]
list of RTEPairs
_read_etree(self, doc)
Map the XML input into an RTEPair.
source code
list of RTEPairs
pairs(self, fileids)
Build a list of RTEPairs from a RTE corpus.
source code

Inherited from xmldocs.XMLCorpusReader: __init__, raw, words, xml

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, fileids, open, readme

Inherited from api.CorpusReader (private): _get_root

    Deprecated since 0.8

Inherited from xmldocs.XMLCorpusReader: read

    Deprecated since 0.9.7

Inherited from api.CorpusReader: files

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _fileids, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Method Details [hide private]

_read_etree(self, doc)

source code 

Map the XML input into an RTEPair.

This uses the getiterator() method from the ElementTree package to find all the <pair> elements.

Parameters:
  • doc - a parsed XML document
Returns: list of RTEPairs

pairs(self, fileids)

source code 

Build a list of RTEPairs from a RTE corpus.

Parameters:
  • fileids - a list of RTE corpus fileids
Returns: list of RTEPairs