Package nltk :: Package tag :: Module tnt
[hide private]
[frames] | no frames]

Module tnt

source code

Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants

http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf

Classes [hide private]
TnT
TnT - Statistical POS tagger
Functions [hide private]
 
basic_sent_chop(data, raw=True)
Basic method for tokenizing input into sentences for this tagger:
source code
 
demo() source code
 
demo2() source code
 
demo3() source code
Function Details [hide private]

basic_sent_chop(data, raw=True)

source code 

Basic method for tokenizing input into sentences
for this tagger:

@param data: list of tokens
             tokens can be either
             words or (word, tag) tuples
@type data: [string,]
            or [(string, string),]

@param raw: boolean flag marking the input data
            as a list of words or a list of tagged words
@type raw: Boolean

@ret : list of sentences
       sentences are a list of tokens
       tokens are the same as the input

Function takes a list of tokens and separates the tokens into lists
where each list represents a sentence fragment
This function can separate both tagged and raw sequences into
basic sentences.

Sentence markers are the set of [,.!?]

This is a simple method which enhances the performance of the TnT
tagger. Better sentence tokenization will further enhance the results.