Package nltk :: Package stem
[hide private]
[frames] | no frames]

Package stem

source code

Interfaces used to remove morphological affixes from words, leaving only the word stem. Stemming algorithms aim to remove those affixes required for eg. grammatical role, tense, derivational morphology leaving only the stem of the word. This is a difficult problem due to irregular words (eg. common verbs in English), complicated morphological rules, and part-of-speech and sense ambiguities (eg. ceil- is not the stem of ceiling).

StemmerI defines a standard interface for stemmers.

Submodules [hide private]

Classes [hide private]
RegexpStemmer
A stemmer that uses regular expressions to identify morphological affixes.
LancasterStemmer
ISRIStemmer
ISRI Arabic stemmer based on algorithm: Arabic Stemming without a root dictionary.
WordNetLemmatizer
A lemmatizer that uses WordNet's built-in morphy function.
RSLPStemmer
A stemmer for Portuguese.
StemmerI
A processing interface for removing morphological affixes from words.
PorterStemmer
A word stemmer based on the original Porter stemming algorithm.
SnowballStemmer
A word stemmer based on the Snowball stemming algorithms.
Variables [hide private]
  stopwords = <WordListCorpusReader in '.../corpora/stopwords' (...
Variables Details [hide private]

stopwords

Value:
<WordListCorpusReader in '.../corpora/stopwords' (not loaded yet)>