Package nltk :: Package parse :: Module nonprojectivedependencyparser :: Class ProbabilisticNonprojectiveParser
[hide private]
[frames] | no frames]

type ProbabilisticNonprojectiveParser

source code

object --+
         |
        ProbabilisticNonprojectiveParser

A probabilistic non-projective dependency parser. Nonprojective dependencies allows for "crossing branches" in the parse tree which is necessary for representing particular linguistic phenomena, or even typical parses in some languages. This parser follows the MST parsing algorithm, outlined in McDonald(2005), which likens the search for the best non-projective parse to finding the maximum spanning tree in a weighted directed graph.

Instance Methods [hide private]
 
__init__(self)
Creates a new non-projective parser.
source code
 
train(self, graphs, dependency_scorer)
Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser's scorer.
source code
 
initialize_edge_scores(self, graph)
Assigns a score to every edge in the DependencyGraph graph.
source code
 
collapse_nodes(self, new_node, cycle_path, g_graph, b_graph, c_graph)
Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node.
source code
 
update_edge_scores(self, new_node, cycle_path)
Updates the edge scores to reflect a collapse operation into new_node.
source code
 
compute_original_indexes(self, new_indexes)
As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were.
source code
 
compute_max_subtract_score(self, column_index, cycle_indexes)
When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse.
source code
 
best_incoming_arc(self, node_index)
Returns the source of the best incoming arc to the node with address: node_index
source code
 
original_best_arc(self, node_index)
???
source code
 
parse(self, tokens, tags)
Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses.
source code
Method Details [hide private]

__init__(self)
(Constructor)

source code 

Creates a new non-projective parser.

Overrides: object.__init__

train(self, graphs, dependency_scorer)

source code 

Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser's scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.

Parameters:
  • graphs (A list of DependencyGraph) - A list of dependency graphs to train the scorer.
  • dependency_scorer (DependencyScorerI) - A scorer which implements the DependencyScorerI interface.

initialize_edge_scores(self, graph)

source code 

Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser's scorer which was assigned during the training process.

Parameters:
  • graph (DependencyGraph) - A dependency graph to assign scores to.

collapse_nodes(self, new_node, cycle_path, g_graph, b_graph, c_graph)

source code 

Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.

Parameters:
  • new_node (Node.) - A Node (Dictionary) to collapse the cycle nodes into.
  • cycle_path (A list of integers.) - A list of node addresses, each of which is in the cycle.
  • g_graph, b_graph, c_graph - Graphs which need to be updated.

update_edge_scores(self, new_node, cycle_path)

source code 

Updates the edge scores to reflect a collapse operation into new_node.

Parameters:
  • new_node (A Node.) - The node which cycle nodes are collapsed into.
  • cycle_path (A list of integers.) - A list of node addresses that belong to the cycle.

compute_original_indexes(self, new_indexes)

source code 

As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.

Parameters:
  • new_addresses - A list of node addresses to check for subsumed nodes.
  • new_address (A list of integers.)

compute_max_subtract_score(self, column_index, cycle_indexes)

source code 

When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.

Parameters:
  • column_index (integer.) - A index representing the column of incoming arcs to a particular node being updated
  • cycle_indexes (A list of integers.) - Only arcs from cycle nodes are considered. This is a list of such nodes addresses.

best_incoming_arc(self, node_index)

source code 

Returns the source of the best incoming arc to the node with address: node_index

Parameters:
  • node_index (integer.) - The address of the 'destination' node, the node that is arced to.

parse(self, tokens, tags)

source code 

Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the DependencyScorerI interface and passing it to the training algorithm.

Parameters:
  • tokens (A list of String.) - A list of words or punctuation to be parsed.
  • tags (A List of String.) - A list of tags corresponding by index to the words in the tokens list.