Package stem
source code
Interfaces used to remove morphological affixes from words, leaving
only the word stem. Stemming algorithms aim to remove those affixes
required for eg. grammatical role, tense, derivational morphology leaving
only the stem of the word. This is a difficult problem due to irregular
words (eg. common verbs in English), complicated morphological rules, and
part-of-speech and sense ambiguities (eg. ceil- is not the
stem of ceiling).
StemmerI defines a standard interface for stemmers.
RegexpStemmer
A stemmer that uses regular expressions to identify morphological
affixes.
|
|
LancasterStemmer
|
ISRIStemmer
ISRI Arabic stemmer based on algorithm: Arabic Stemming without a
root dictionary.
|
WordNetLemmatizer
A lemmatizer that uses WordNet's built-in morphy function.
|
RSLPStemmer
A stemmer for Portuguese.
|
StemmerI
A processing interface for removing morphological affixes from
words.
|
PorterStemmer
A word stemmer based on the original Porter stemming algorithm.
|
SnowballStemmer
A word stemmer based on the Snowball stemming algorithms.
|
|
|
stopwords = <WordListCorpusReader in '.../corpora/stopwords' (...
|
stopwords
- Value:
<WordListCorpusReader in '.../corpora/stopwords' (not loaded yet)>
|
|