Package nltk :: Package chunk :: Module regexp :: Class RegexpParser
[hide private]
[frames] | no frames]

type RegexpParser

source code

       object --+        
                |        
parse.api.ParserI --+    
                    |    
     api.ChunkParserI --+
                        |
                       RegexpParser

A grammar based chunk parser. chunk.RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ChunkString. The rules are all implemented using regular expression matching and substitution.

A grammar contains one or more clauses in the following form:

NP:
  {<DT|JJ>}          # chunk determiners and adjectives
  }<[\.VI].*>+{      # chink any tag beginning with V, I, or .
  <.*>}{<DT>         # split a chunk at a determiner
  <DT|JJ>{}<NN.*>    # merge chunk ending with det/adj
                     # with one starting with a noun

The patterns of a clause are executed in order. An earlier pattern may introduce a chunk boundary that prevents a later pattern from executing. Sometimes an individual pattern will match on multiple, overlapping extents of the input. As with regular expression substitution more generally, the chunker will identify the first match possible, then continue looking for matches after this one has ended.

The clauses of a grammar are also executed in order. A cascaded chunk parser is one having more than one clause. The maximum depth of a parse tree created by this chunk parser is the same as the number of clauses in the grammar.

When tracing is turned on, the comment portion of a line is displayed each time the corresponding pattern is applied.

Instance Methods [hide private]
 
__init__(self, grammar, top_node='S', loop=1, trace=0)
Create a new chunk parser, from the given start state and set of chunk patterns.
source code
 
_parse_grammar(self, grammar, top_node, trace)
Helper function for __init__: parse the grammar if it is a string.
source code
 
_add_stage(self, rules, lhs, top_node, trace)
Helper function for __init__: add a new stage to the parser.
source code
Tree
parse(self, chunk_struct, trace=None)
Apply the chunk parser to this input.
source code
string
__repr__(self)
Returns: a concise string representation of this chunk.RegexpParser.
source code
string
__str__(self)
Returns: a verbose string representation of this RegexpChunkParser.
source code

Inherited from api.ChunkParserI: evaluate

Inherited from parse.api.ParserI: batch_iter_parse, batch_nbest_parse, batch_parse, batch_prob_parse, grammar, iter_parse, nbest_parse, prob_parse

    Deprecated

Inherited from parse.api.ParserI: batch_test, get_parse, get_parse_dict, get_parse_list, get_parse_prob

Instance Variables [hide private]
int _stages
The list of parsing stages corresponding to the grammar
string _start
The start symbol of the grammar (the root node of resulting trees)
Method Details [hide private]

__init__(self, grammar, top_node='S', loop=1, trace=0)
(Constructor)

source code 

Create a new chunk parser, from the given start state and set of chunk patterns.

Parameters:
  • grammar (string or list of RegexpChunkParser) - The grammar, or a list of RegexpChunkParser objects
  • top_node (string or Nonterminal) - The top node of the tree being created
  • loop (int) - The number of times to run through the patterns
  • trace (int) - The level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or higher will generate verbose tracing output.
Overrides: object.__init__

parse(self, chunk_struct, trace=None)

source code 

Apply the chunk parser to this input.

Parameters:
  • chunk_struct (Tree) - the chunk structure to be (further) chunked (this tree is modified, and is also returned)
  • trace (int) - The level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or highter will generate verbose tracing output. This value overrides the trace level value that was given to the constructor.
Returns: Tree
the chunked output.
Overrides: parse.api.ParserI.parse

__repr__(self)
(Representation operator)

source code 
Returns: string
a concise string representation of this chunk.RegexpParser.
Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 
Returns: string
a verbose string representation of this RegexpChunkParser.
Overrides: object.__str__