Package nltk :: Module align :: Class EMIBMModel1
[hide private]
[frames] | no frames]

type EMIBMModel1

source code

object --+
         |
        EMIBMModel1

This class contains implementations of the Expectation Maximization algorithm for IBM Model 1. The algorithm runs upon a sentence-aligned parallel corpus and generates word alignments in aligned sentence pairs.

The process is divided into 2 main stages. Stage 1: Studies word-to-word translation probabilities by collecting evidence of a English word been the translation of a foreign word from the parallel corpus.

Stage 2: Based on the translation probabilities from Stage 1, generates word alignments for aligned sentence pairs.

Instance Methods [hide private]
 
__init__(self, aligned_sents, convergent_threshold=0.01, debug=False)
Initialize a new EMIBMModel1.
source code
 
train(self)
The train() function implements Expectation Maximization training stage that learns word-to-word translation probabilities.
source code
 
aligned(self)
Returns a list of AlignedSents with Alignments calculated using IBM-Model 1.
source code
Method Details [hide private]

__init__(self, aligned_sents, convergent_threshold=0.01, debug=False)
(Constructor)

source code 

Initialize a new EMIBMModel1.

Parameters:
  • aligned_sents (list of AlignedSent objects) - The parallel text corpus.Iteratable containing AlignedSent instances of aligned sentence pairs from the corpus.
  • convergent_threshold (float) - The threshold value of convergence. An entry is considered converged if the delta from old_t to new_t is less than this value. The algorithm terminates when all entries are converged. This parameter is optional, default is 0.01
Overrides: object.__init__

train(self)

source code 

The train() function implements Expectation Maximization training stage that learns word-to-word translation probabilities.

Returns:
Number of iterations taken to converge