Package nltk :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Classes [hide private]
Index
OrderedDict
AbstractLazySequence
An abstract base class for read-only sequences whose values are computed as needed.
LazySubsequence
A subsequence produced by slicing a lazy sequence.
LazyConcatenation
A lazy sequence formed by concatenating a list of lists.
LazyMap
A lazy sequence whose elements are formed by applying a given function to each element in one or more underlying lists.
LazyZip
A lazy sequence whose elements are tuples, each containing the i-th element from each of the argument sequences.
LazyEnumerate
A lazy sequence whose elements are tuples, each ontaining a count (from zero) and a value yielded by underlying sequence.
Functions [hide private]
 
usage(obj, selfname='self') source code
boolean
in_idle()
Returns: true if this function is run within idle.
source code
 
pr(data, start=0, end=None)
Pretty print a sequence of data items
source code
 
print_string(s, width=70)
Pretty print a string, breaking lines on whitespace
source code
 
tokenwrap(tokens, separator=' ', width=70)
Pretty print a list of text tokens, breaking lines on whitespace
source code
string
re_show(regexp, string, left='{', right='}')
Search string for substrings matching regexp and wrap the matches with braces.
source code
 
filestring(f) source code
 
breadth_first(tree, children=<built-in function iter>, depth=-1, queue=None)
Traverse the nodes of a tree in breadth-first order.
source code
 
guess_encoding(data)
Given a byte string, attempt to decode it.
source code
 
invert_dict(d) source code
dict of sets
transitive_closure(graph, reflexive=False)
Calculate the transitive closure of a directed graph, optionally the reflexive transitive closure.
source code
dict of sets
invert_graph(graph)
Inverts a directed graph.
source code
string
clean_html(html)
Remove HTML markup from the given string.
source code
 
clean_url(url) source code
list
flatten(*args)
Flatten a list.
source code
list of tuples
ngrams(sequence, n, pad_left=False, pad_right=False, pad_symbol=None)
A utility that produces a sequence of ngrams from a sequence of items.
source code
list of tuples
bigrams(sequence, **kwargs)
A utility that produces a sequence of bigrams from a sequence of items.
source code
list of tuples
trigrams(sequence, **kwargs)
A utility that produces a sequence of trigrams from a sequence of items.
source code
iterator of tuples
ingrams(sequence, n, pad_left=False, pad_right=False, pad_symbol=None)
A utility that produces an iterator over ngrams generated from a sequence of items.
source code
iterator of tuples
ibigrams(sequence, **kwargs)
A utility that produces an iterator over bigrams generated from a sequence of items.
source code
iterator of tuples
itrigrams(sequence, **kwargs)
A utility that produces an iterator over trigrams generated from a sequence of items.
source code
 
binary_search_file(file, key, cache={}, cacheDepth=-1)
Searches through a sorted file using the binary search algorithm.
source code
 
set_proxy(proxy, (user, password)=(None, ''))
Set the HTTP proxy for Python to download through.
source code
Function Details [hide private]

in_idle()

source code 
Returns: boolean
true if this function is run within idle. Tkinter programs that are run in idle should never call Tk.mainloop; so this function should be used to gate all calls to Tk.mainloop.

Warning: This function works by checking sys.stdin. If the user has modified sys.stdin, then it may return incorrect results.

pr(data, start=0, end=None)

source code 

Pretty print a sequence of data items

Parameters:
  • data (sequence or iterator) - the data stream to print
  • start (int) - the start position
  • end (int) - the end position

print_string(s, width=70)

source code 

Pretty print a string, breaking lines on whitespace

Parameters:
  • s (string) - the string to print, consisting of words and spaces
  • width (int) - the display width

tokenwrap(tokens, separator=' ', width=70)

source code 

Pretty print a list of text tokens, breaking lines on whitespace

Parameters:
  • tokens (list) - the tokens to print
  • separator (str) - the string to use to separate tokens
  • width (int) - the display width (default=70)

re_show(regexp, string, left='{', right='}')

source code 

Search string for substrings matching regexp and wrap the matches with braces. This is convenient for learning about regular expressions.

Parameters:
  • regexp (string) - The regular expression.
  • string (string) - The string being matched.
  • left (string) - The left delimiter (printed before the matched substring)
  • right (string) - The right delimiter (printed after the matched substring)
Returns: string
A string with markers surrounding the matched substrings.

breadth_first(tree, children=<built-in function iter>, depth=-1, queue=None)

source code 

Traverse the nodes of a tree in breadth-first order. (No need to check for cycles.) The first argument should be the tree root; children should be a function taking as argument a tree node and returning an iterator of the node's children.

guess_encoding(data)

source code 

Given a byte string, attempt to decode it. Tries the standard 'UTF8' and 'latin-1' encodings, Plus several gathered from locale information.

The calling program *must* first call:

   locale.setlocale(locale.LC_ALL, '')

If successful it returns (decoded_unicode, successful_encoding). If unsuccessful it raises a UnicodeError.

transitive_closure(graph, reflexive=False)

source code 

Calculate the transitive closure of a directed graph, optionally the reflexive transitive closure.

The algorithm is a slight modification of the "Marking Algorithm" of Ioannidis & Ramakrishnan (1998) "Efficient Transitive Closure Algorithms".

Parameters:
  • graph (dict of sets) - the initial graph, represented as a dictionary of sets
  • reflexive (bool) - if set, also make the closure reflexive
Returns: dict of sets
the (reflexive) transitive closure of the graph

invert_graph(graph)

source code 

Inverts a directed graph.

Parameters:
  • graph (dict of sets) - the graph, represented as a dictionary of sets
Returns: dict of sets
the inverted graph

clean_html(html)

source code 

Remove HTML markup from the given string.

Parameters:
  • html (string) - the HTML string to be cleaned
Returns: string

flatten(*args)

source code 

Flatten a list.

>>> flatten(1, 2, ['b', 'a' , ['c', 'd']], 3)
[1, 2, 'a', 'b', 'c', 'd', 3]
Parameters:
  • *args - items and lists to be combined into a single list
Returns: list

ngrams(sequence, n, pad_left=False, pad_right=False, pad_symbol=None)

source code 

A utility that produces a sequence of ngrams from a sequence of items. For example:

>>> ngrams([1,2,3,4,5], 3)
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Use ingram for an iterator version of this function. Set pad_left or pad_right to true in order to get additional ngrams:

>>> ngrams([1,2,3,4,5], 2, pad_right=True)
[(1, 2), (2, 3), (3, 4), (4, 5), (5, None)]
Parameters:
  • sequence (sequence or iterator) - the source data to be converted into ngrams
  • n (int) - the degree of the ngrams
  • pad_left (boolean) - whether the ngrams should be left-padded
  • pad_right (boolean) - whether the ngrams should be right-padded
  • pad_symbol (any) - the symbol to use for padding (default is None)
Returns: list of tuples
The ngrams

bigrams(sequence, **kwargs)

source code 

A utility that produces a sequence of bigrams from a sequence of items. For example:

>>> bigrams([1,2,3,4,5])
[(1, 2), (2, 3), (3, 4), (4, 5)]

Use ibigrams for an iterator version of this function.

Parameters:
  • sequence (sequence or iterator) - the source data to be converted into bigrams
Returns: list of tuples
The bigrams

trigrams(sequence, **kwargs)

source code 

A utility that produces a sequence of trigrams from a sequence of items. For example:

>>> trigrams([1,2,3,4,5])
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Use itrigrams for an iterator version of this function.

Parameters:
  • sequence (sequence or iterator) - the source data to be converted into trigrams
Returns: list of tuples
The trigrams

ingrams(sequence, n, pad_left=False, pad_right=False, pad_symbol=None)

source code 

A utility that produces an iterator over ngrams generated from a sequence of items.

For example:

>>> list(ingrams([1,2,3,4,5], 3))
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Use ngrams for a list version of this function. Set pad_left or pad_right to true in order to get additional ngrams:

>>> list(ingrams([1,2,3,4,5], 2, pad_right=True))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, None)]
Parameters:
  • sequence (sequence or iterator) - the source data to be converted into ngrams
  • n (int) - the degree of the ngrams
  • pad_left (boolean) - whether the ngrams should be left-padded
  • pad_right (boolean) - whether the ngrams should be right-padded
  • pad_symbol (any) - the symbol to use for padding (default is None)
Returns: iterator of tuples
The ngrams

ibigrams(sequence, **kwargs)

source code 

A utility that produces an iterator over bigrams generated from a sequence of items.

For example:

>>> list(ibigrams([1,2,3,4,5]))
[(1, 2), (2, 3), (3, 4), (4, 5)]

Use bigrams for a list version of this function.

Parameters:
  • sequence (sequence or iterator) - the source data to be converted into bigrams
Returns: iterator of tuples
The bigrams

itrigrams(sequence, **kwargs)

source code 

A utility that produces an iterator over trigrams generated from a sequence of items.

For example:

>>> list(itrigrams([1,2,3,4,5])
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Use trigrams for a list version of this function.

Parameters:
  • sequence (sequence or iterator) - the source data to be converted into trigrams
Returns: iterator of tuples
The trigrams

binary_search_file(file, key, cache={}, cacheDepth=-1)

source code 

Searches through a sorted file using the binary search algorithm.

Parameters:
  • file (file) - the file to be searched through.
  • key ({string}) - the identifier we are searching for.
Returns:
The line from the file with first word key.

set_proxy(proxy, (user, password)=(None, ''))

source code 

Set the HTTP proxy for Python to download through.

If proxy is None then tries to set proxy from enviroment or system settings.

Parameters:
  • proxy - The HTTP proxy server to use. For example: 'http://proxy.example.com:3128/'
  • user - The username to authenticate with. Use None to disable authentication.
  • password - The password to authenticate with.