Package nltk :: Package stem :: Module snowball :: Class FrenchStemmer
[hide private]
[frames] | no frames]

type FrenchStemmer

source code

          object --+            
                   |            
        api.StemmerI --+        
                       |        
_LanguageSpecificStemmer --+    
                           |    
            _StandardStemmer --+
                               |
                              FrenchStemmer

The French Snowball stemmer.


Note: A detailed description of the French stemming algorithm can be found under http://snowball.tartarus.org/algorithms /french/stemmer.html.

Instance Methods [hide private]
unicode
stem(self, word)
Stem a French word and return the stemmed form.
source code
unicode
__rv_french(self, word, vowels)
Return the region RV that is used by the French stemmer.
source code

Inherited from _StandardStemmer (private): _r1r2_standard, _rv_standard

Inherited from _LanguageSpecificStemmer: __init__, __repr__

Class Variables [hide private]
unicode __vowels = u'aeiouyâàëéêèïîôûù'
The French vowels.
tuple __step1_suffixes = (u'issements', u'issement', u'atrices', u'a...
Suffixes to be deleted in step 1 of the algorithm.
tuple __step2a_suffixes = (u'issaIent', u'issantes', u'iraIent', u'i...
Suffixes to be deleted in step 2a of the algorithm.
tuple __step2b_suffixes = (u'eraIent', u'assions', u'erions', u'asse...
Suffixes to be deleted in step 2b of the algorithm.
tuple __step4_suffixes = (u'ière', u'Ière', u'ion', u'ier', u'Ier', ...
Suffixes to be deleted in step 4 of the algorithm.
Method Details [hide private]

stem(self, word)

source code 

Stem a French word and return the stemmed form.

Parameters:
  • word (str, unicode) - The word that is stemmed.
Returns: unicode
The stemmed form.
Overrides: api.StemmerI.stem

__rv_french(self, word, vowels)

source code 

Return the region RV that is used by the French stemmer.

If the word begins with two vowels, RV is the region after the third letter. Otherwise, it is the region after the first vowel not at the beginning of the word, or the end of the word if these positions cannot be found. (Exceptionally, u'par', u'col' or u'tap' at the beginning of a word is also taken to define RV as the region to their right.)

Parameters:
  • word (str, unicode) - The French word whose region RV is determined.
  • vowels (unicode) - The French vowels that are used to determine the region RV.
Returns: unicode
rv, the region RV for the respective French word.

Note: This helper method is invoked by the stem method of the subclass FrenchStemmer. It is not to be invoked directly!


Class Variable Details [hide private]

__step1_suffixes

Suffixes to be deleted in step 1 of the algorithm.
Type:
tuple
Value:
(u'issements',
 u'issement',
 u'atrices',
 u'atrice',
 u'ateurs',
 u'ations',
 u'logies',
 u'usions',
...

__step2a_suffixes

Suffixes to be deleted in step 2a of the algorithm.
Type:
tuple
Value:
(u'issaIent',
 u'issantes',
 u'iraIent',
 u'issante',
 u'issants',
 u'issions',
 u'irions',
 u'issais',
...

__step2b_suffixes

Suffixes to be deleted in step 2b of the algorithm.
Type:
tuple
Value:
(u'eraIent',
 u'assions',
 u'erions',
 u'assent',
 u'assiez',
 u'èrent',
 u'erais',
 u'erait',
...

__step4_suffixes

Suffixes to be deleted in step 4 of the algorithm.
Type:
tuple
Value:
(u'ière', u'Ière', u'ion', u'ier', u'Ier', u'e', u'ë')