Package nltk :: Package stem :: Module snowball2 :: Class _StandardStemmer
[hide private]
[frames] | no frames]

type _StandardStemmer

source code

          object --+        
                   |        
        api.StemmerI --+    
                       |    
_LanguageSpecificStemmer --+
                           |
                          _StandardStemmer
Known Subclasses:

This subclass encapsulates two methods for defining the standard versions of the string regions R1, R2, and RV.

Instance Methods [hide private]
tuple
_r1r2_standard(self, word, vowels)
Return the standard interpretations of the string regions R1 and R2.
source code
unicode
_rv_standard(self, word, vowels)
Return the standard interpretation of the string region RV.
source code

Inherited from _LanguageSpecificStemmer: __init__, __repr__

Inherited from api.StemmerI: stem

Method Details [hide private]

_r1r2_standard(self, word, vowels)

source code 

Return the standard interpretations of the string regions R1 and R2.

R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.

Parameters:
  • word (str, unicode) - The word whose regions R1 and R2 are determined.
  • vowels (unicode) - The vowels of the respective language that are used to determine the regions R1 and R2.
Returns: tuple
(r1,r2), the regions R1 and R2 for the respective word.
Notes:

_rv_standard(self, word, vowels)

source code 

Return the standard interpretation of the string region RV.

If the second letter is a consonant, RV is the region after the next following vowel. If the first two letters are vowels, RV is the region after the next following consonant. Otherwise, RV is the region after the third letter.

Parameters:
  • word (str, unicode) - The word whose region RV is determined.
  • vowels (unicode) - The vowels of the respective language that are used to determine the region RV.
Returns: unicode
rv, the region RV for the respective word.

Note: This helper method is invoked by the respective stem method of the subclasses ItalianStemmer, PortugueseStemmer, RomanianStemmer, and SpanishStemmer. It is not to be invoked directly!