TokenSearcher

object --+

A class that makes it easier to use regular expressions to search over tokenized strings. The tokenized string is converted to a string where tokens are marked with angle brackets -- e.g., '<the><window><is><still><open>'. The regular expression passed to the findall() method is modified to treat angle brackets as nongrouping parentheses, in addition to matching the token boundaries; and to have '.' not match the angle brackets.

findall(self, regexp)
Find instances of the regular expression in the text.
Find instances of the regular expression in the text. The text is a list of tokens, and a regexp pattern to match a single token must be surrounded by angle brackets. E.g.

>>> ts.findall("<.*><.*><bro>")
['you rule bro', ['telling you bro; u twizted bro
>>> ts.findall("<a>(<.*>)<man>")
monied; nervous; dangerous; white; white; white; pious; queer; good;
mature; white; Cape; great; wise; wise; butterless; white; fiendish;
pale; furious; better; certain; complete; dismasted; younger; brave;
brave; brave; brave
>>> text9.findall("<th.*>{3,}")
thread through those; the thought that; that the thing; the thing
that; that that thing; through these than through; them that the;
through the thick; them that they; thought that the
  • regexp (str) - A regular expression