Package nltk :: Module sourcedstring
[hide private]
[frames] | no frames]

Module sourcedstring

source code


X{Sourced strings} are strings that are annotated with information
about the location in a document where they were originally found.
Sourced strings are subclassed from Python strings.  As a result, they
can usually be used anywhere a normal Python string can be used.

  >>> newt_contents = '''  ... She turned me into a newt!
  ... I got better.'''
  >>> newt_doc = SourcedString(newt_contents, 'newt.txt')
  >>> print repr(newt_doc)
  'She turned me into a newt!
I got better.'@[0:40]
  >>> newt = newt_doc.split()[5] # Find the sixth word.
  >>> print repr(newt)
  'newt!'@[21:26]

Classes [hide private]
StringSource
A description of the location of a string in a document.
ConsecutiveCharStringSource
A StringSource that specifies the source of strings whose characters have consecutive offsets.
ContiguousCharStringSource
A StringSource that specifies the source of strings whose character are contiguous, but do not necessarily have consecutive offsets.
SourcedString
A string that is annotated with information about the location in a document where it was originally found.
SimpleSourcedString
A single substring of a document, annotated with information about the location in the document where it was originally found.
CompoundSourcedString
A string constructed by concatenating substrings from multiple sources, and annotated with information about the locations where those substrings were originally found.
SimpleSourcedByteString
SimpleSourcedUnicodeString
CompoundSourcedByteString
CompoundSourcedUnicodeString
SourcedStringRegexp
Wrapper for regexp pattern objects that cause the sub and subn methods to return sourced strings.
SourcedStringStream
Wrapper for a read-only stream that causes read() (and related methods) to return sourced strings.