Module rte_classify
source code
Simple classifier for RTE corpus.
It calculates the overlap in words and named entities between text and
hypothesis, and also whether there are words / named entities in the
hypothesis which fail to occur in the text, since this is an indicator
that the hypothesis is more informative than (i.e not entailed by) the
text.
TO DO: better Named Entity classification TO DO: add lemmatization
RTEFeatureExtractor
This builds a bag of words for both the text and the hypothesis
after throwing away some stopwords, then calculates overlap and
difference.
|
|
|
ne(token)
This just assumes that words in all caps or titles are named
entities. |
source code
|
|
|
|
lemmatize(word)
Use morphy from WordNet to find the base form of verbs. |
source code
|
|
|
|
|
|
|
rte_classifier(trainer,
features=<function rte_features at 0x11cc630>)
Classify RTEPairs |
source code
|
|
|
|
|
|
|
|
|
|
|
|
This just assumes that words in all caps or titles are named
entities.
- Parameters:
|