Package nltk :: Package stem :: Module snowball :: Class HungarianStemmer
[hide private]
[frames] | no frames]

type HungarianStemmer

source code

          object --+        
                   |        
        api.StemmerI --+    
                       |    
_LanguageSpecificStemmer --+
                           |
                          HungarianStemmer

The Hungarian Snowball stemmer.


Note: A detailed description of the Hungarian stemming algorithm can be found under http://snowball.tartarus.org/algorithms /hungarian/stemmer.html.

Instance Methods [hide private]
unicode
stem(self, word)
Stem an Hungarian word and return the stemmed form.
source code
unicode
__r1_hungarian(self, word, vowels, digraphs)
Return the region R1 that is used by the Hungarian stemmer.
source code

Inherited from _LanguageSpecificStemmer: __init__, __repr__

Class Variables [hide private]
unicode __vowels = u'aeiouöüáéíóõúû'
The Hungarian vowels.
tuple __digraphs = (u'cs', u'dz', u'dzs', u'gy', u'ly', u'ny', u'ty'...
The Hungarian digraphs.
tuple __double_consonants = (u'bb', u'cc', u'ccs', u'dd', u'ff', u'g...
The Hungarian double consonants.
tuple __step1_suffixes = (u'al', u'el')
Suffixes to be deleted in step 1 of the algorithm.
tuple __step2_suffixes = (u'képpen', u'onként', u'enként', u'anként'...
Suffixes to be deleted in step 2 of the algorithm.
tuple __step3_suffixes = (u'ánként', u'án', u'én')
Suffixes to be deleted in step 3 of the algorithm.
tuple __step4_suffixes = (u'astul', u'estül', u'ástul', u'éstül', u'...
Suffixes to be deleted in step 4 of the algorithm.
tuple __step5_suffixes = (u'á', u'é')
Suffixes to be deleted in step 5 of the algorithm.
tuple __step6_suffixes = (u'oké', u'öké', u'aké', u'eké', u'áké', u'...
Suffixes to be deleted in step 6 of the algorithm.
tuple __step7_suffixes = (u'ájuk', u'éjük', u'ünk', u'unk', u'juk', ...
Suffixes to be deleted in step 7 of the algorithm.
tuple __step8_suffixes = (u'jaitok', u'jeitek', u'jaink', u'jeink', ...
Suffixes to be deleted in step 8 of the algorithm.
tuple __step9_suffixes = (u'ák', u'ék', u'ök', u'ok', u'ek', u'ak', ...
Suffixes to be deleted in step 9 of the algorithm.
Method Details [hide private]

stem(self, word)

source code 

Stem an Hungarian word and return the stemmed form.

Parameters:
  • word (str, unicode) - The word that is stemmed.
Returns: unicode
The stemmed form.
Overrides: api.StemmerI.stem

__r1_hungarian(self, word, vowels, digraphs)

source code 

Return the region R1 that is used by the Hungarian stemmer.

If the word begins with a vowel, R1 is defined as the region after the first consonant or digraph (= two letters stand for one phoneme) in the word. If the word begins with a consonant, it is defined as the region after the first vowel in the word. If the word does not contain both a vowel and consonant, R1 is the null region at the end of the word.

Parameters:
  • word (str, unicode) - The Hungarian word whose region R1 is determined.
  • vowels (unicode) - The Hungarian vowels that are used to determine the region R1.
  • digraphs (tuple) - The digraphs that are used to determine the region R1.
Returns: unicode
r1, the region R1 for the respective word.

Note: This helper method is invoked by the stem method of the subclass HungarianStemmer. It is not to be invoked directly!


Class Variable Details [hide private]

__digraphs

The Hungarian digraphs.
Type:
tuple
Value:
(u'cs', u'dz', u'dzs', u'gy', u'ly', u'ny', u'ty', u'zs')

__double_consonants

The Hungarian double consonants.
Type:
tuple
Value:
(u'bb',
 u'cc',
 u'ccs',
 u'dd',
 u'ff',
 u'gg',
 u'ggy',
 u'jj',
...

__step2_suffixes

Suffixes to be deleted in step 2 of the algorithm.
Type:
tuple
Value:
(u'képpen',
 u'onként',
 u'enként',
 u'anként',
 u'képp',
 u'ként',
 u'ban',
 u'ben',
...

__step4_suffixes

Suffixes to be deleted in step 4 of the algorithm.
Type:
tuple
Value:
(u'astul', u'estül', u'ástul', u'éstül', u'stul', u'stül')

__step6_suffixes

Suffixes to be deleted in step 6 of the algorithm.
Type:
tuple
Value:
(u'oké',
 u'öké',
 u'aké',
 u'eké',
 u'áké',
 u'áéi',
 u'éké',
 u'ééi',
...

__step7_suffixes

Suffixes to be deleted in step 7 of the algorithm.
Type:
tuple
Value:
(u'ájuk',
 u'éjük',
 u'ünk',
 u'unk',
 u'juk',
 u'jük',
 u'ánk',
 u'énk',
...

__step8_suffixes

Suffixes to be deleted in step 8 of the algorithm.
Type:
tuple
Value:
(u'jaitok',
 u'jeitek',
 u'jaink',
 u'jeink',
 u'aitok',
 u'eitek',
 u'áitok',
 u'éitek',
...

__step9_suffixes

Suffixes to be deleted in step 9 of the algorithm.
Type:
tuple
Value:
(u'ák', u'ék', u'ök', u'ok', u'ek', u'ak', u'k')