Package nltk :: Package parse :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Utility functions for parsers.

Classes [hide private]
    Test Suites
TestGrammar
Unit tests for CFG.
Functions [hide private]
 
load_parser(grammar_url, trace=0, parser=None, chart_class=None, beam_size=0, **load_args)
Load a grammar from a file, and build a parser based on that grammar.
source code
    Test Suites
 
extract_test_sentences(string, comment_chars='#%;')
Parses a string with one test sentence per line.
source code
Function Details [hide private]

load_parser(grammar_url, trace=0, parser=None, chart_class=None, beam_size=0, **load_args)

source code 

Load a grammar from a file, and build a parser based on that grammar. The parser depends on the grammar format, and might also depend on properties of the grammar itself.

The following grammar formats are currently supported:

Parameters:
  • grammar_url (str) - A URL specifying where the grammar is located. The default protocol is "nltk:", which searches for the file in the the NLTK data package.
  • trace (int) - The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.
  • parser - The class used for parsing; should be ChartParser or a subclass. If None, the class depends on the grammar format.
  • chart_class - The class used for storing the chart; should be Chart or a subclass. Only used for CFGs and feature CFGs. If None, the chart class depends on the grammar format.
  • beam_size (int) - The maximum length for the parser's edge queue. Only used for probabilistic CFGs.
  • load_args - Keyword parameters used when loading the grammar. See data.load for more information.

extract_test_sentences(string, comment_chars='#%;')

source code 

Parses a string with one test sentence per line. Lines can optionally begin with:

  • a bool, saying if the sentence is grammatical or not, or
  • an int, giving the number of parse trees is should have,

The result information is followed by a colon, and then the sentence. Empty lines and lines beginning with a comment char are ignored.

Parameters:
  • comment_chars - str of possible comment characters.
Returns:
a list of tuple of sentences and expected results, where a sentence is a list of str, and a result is None, or bool, or int