Package nltk :: Package stem :: Module snowball :: Class RussianStemmer
[hide private]
[frames] | no frames]

type RussianStemmer

source code

          object --+        
                   |        
        api.StemmerI --+    
                       |    
_LanguageSpecificStemmer --+
                           |
                          RussianStemmer

The Russian Snowball stemmer.


Note: A detailed description of the Russian stemming algorithm can be found under http://snowball.tartarus.org/algorithms /russian/stemmer.html.

Instance Methods [hide private]
unicode
stem(self, word)
Stem a Russian word and return the stemmed form.
source code
tuple
__regions_russian(self, word)
Return the regions RV and R2 which are used by the Russian stemmer.
source code
unicode
__cyrillic_to_roman(self, word)
Transliterate a Russian word into the Roman alphabet.
source code
unicode
__roman_to_cyrillic(self, word)
Transliterate a Russian word back into the Cyrillic alphabet.
source code

Inherited from _LanguageSpecificStemmer: __init__, __repr__

Class Variables [hide private]
tuple __perfective_gerund_suffixes = (u'ivshis'', u'yvshis'', u'vshi...
Suffixes to be deleted.
tuple __adjectival_suffixes = (u'ui^ushchi^ui^u', u'ui^ushchi^ai^a',...
Suffixes to be deleted.
tuple __reflexive_suffixes = (u'si^a', u's'')
Suffixes to be deleted.
tuple __verb_suffixes = (u'esh'', u'ei`te', u'ui`te', u'ui^ut', u'is...
Suffixes to be deleted.
tuple __noun_suffixes = (u'ii^ami', u'ii^akh', u'i^ami', u'ii^am', u...
Suffixes to be deleted.
tuple __superlative_suffixes = (u'ei`she', u'ei`sh')
Suffixes to be deleted.
tuple __derivational_suffixes = (u'ost'', u'ost')
Suffixes to be deleted.
Method Details [hide private]

stem(self, word)

source code 

Stem a Russian word and return the stemmed form.

Parameters:
  • word (str, unicode) - The word that is stemmed.
Returns: unicode
The stemmed form.
Overrides: api.StemmerI.stem

__regions_russian(self, word)

source code 

Return the regions RV and R2 which are used by the Russian stemmer.

In any word, RV is the region after the first vowel, or the end of the word if it contains no vowel.

R2 is the region after the first non-vowel following a vowel in R1, or the end of the word if there is no such non-vowel.

R1 is the region after the first non-vowel following a vowel, or the end of the word if there is no such non-vowel.

Parameters:
  • word (str, unicode) - The Russian word whose regions RV and R2 are determined.
Returns: tuple
(rv, r2), the regions RV and R2 for the respective Russian word.

Note: This helper method is invoked by the stem method of the subclass RussianStemmer. It is not to be invoked directly!

__cyrillic_to_roman(self, word)

source code 

Transliterate a Russian word into the Roman alphabet.

A Russian word whose letters consist of the Cyrillic alphabet are transliterated into the Roman alphabet in order to ease the forthcoming stemming process.

Parameters:
  • word (unicode) - The word that is transliterated.
Returns: unicode
word, the transliterated word.

Note: This helper method is invoked by the stem method of the subclass RussianStemmer. It is not to be invoked directly!

__roman_to_cyrillic(self, word)

source code 

Transliterate a Russian word back into the Cyrillic alphabet.

A Russian word formerly transliterated into the Roman alphabet in order to ease the stemming process, is transliterated back into the Cyrillic alphabet, its original form.

Parameters:
  • word (str, unicode) - The word that is transliterated.
Returns: unicode
word, the transliterated word.

Note: This helper method is invoked by the stem method of the subclass RussianStemmer. It is not to be invoked directly!


Class Variable Details [hide private]

__perfective_gerund_suffixes

Suffixes to be deleted.
Type:
tuple
Value:
(u'ivshis'',
 u'yvshis'',
 u'vshis'',
 u'ivshi',
 u'yvshi',
 u'vshi',
 u'iv',
 u'yv',
...

__adjectival_suffixes

Suffixes to be deleted.
Type:
tuple
Value:
(u'ui^ushchi^ui^u',
 u'ui^ushchi^ai^a',
 u'ui^ushchimi',
 u'ui^ushchymi',
 u'ui^ushchego',
 u'ui^ushchogo',
 u'ui^ushchemu',
 u'ui^ushchomu',
...

__verb_suffixes

Suffixes to be deleted.
Type:
tuple
Value:
(u'esh'',
 u'ei`te',
 u'ui`te',
 u'ui^ut',
 u'ish'',
 u'ete',
 u'i`te',
 u'i^ut',
...

__noun_suffixes

Suffixes to be deleted.
Type:
tuple
Value:
(u'ii^ami',
 u'ii^akh',
 u'i^ami',
 u'ii^am',
 u'i^akh',
 u'ami',
 u'iei`',
 u'i^am',
...