Package nltk :: Package tag :: Module hunpos :: Class HunposTagger
[hide private]
[frames] | no frames]

type HunposTagger

source code

 object --+    
          |    
api.TaggerI --+
              |
             HunposTagger

A class for pos tagging with HunPos. The input is the paths to:

Example:

>>> ht = HunposTagger('english.model')
>>> ht.tag('What is the airspeed of an unladen swallow ?'.split())
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'),
 ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')]
>>> ht.close()

This class communicates with the hunpos-tag binary via pipes. When the tagger object is no longer needed, the close() method should be called to free system resources. The class supports the context manager interface; if used in a with statement, the close() method is invoked automatically:

>>> with HunposTagger('english.model') as ht:
...     ht.tag('What is the airspeed of an unladen swallow ?'.split())
...
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'),
 ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')]
Instance Methods [hide private]
 
__init__(self, path_to_model, path_to_bin=None, encoding='ISO-8859-1', verbose=False)
Starts the hunpos-tag executable and establishes a connection with it.
source code
 
__del__(self) source code
 
close(self)
Closes the pipe to the hunpos executable.
source code
 
__enter__(self) source code
 
__exit__(self, exc_type, exc_value, traceback) source code
list of (token, tag)
tag(self, tokens)
Tags a single sentence: a list of words.
source code

Inherited from api.TaggerI: batch_tag, evaluate

Inherited from api.TaggerI (private): _check_params

Method Details [hide private]

__init__(self, path_to_model, path_to_bin=None, encoding='ISO-8859-1', verbose=False)
(Constructor)

source code 

Starts the hunpos-tag executable and establishes a connection with it.

Parameters:
  • path_to_model - The model file.
  • path_to_bin - The hunpos-tag binary.
  • encoding - The encoding used by the model. unicode tokens passed to the tag() and batch_tag() methods are converted to this charset when they are sent to hunpos-tag. The default is ISO-8859-1 (Latin-1).

    This parameter is ignored for str tokens, which are sent as-is. The caller must ensure that tokens are encoded in the right charset.

Overrides: object.__init__

tag(self, tokens)

source code 

Tags a single sentence: a list of words. The tokens should not contain any newline characters.

Returns: list of (token, tag)
Overrides: api.TaggerI.tag