| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
basestring --+
|
SourcedString
A string that is annotated with information about the location in a document where it was originally found. Sourced strings are subclassed from Python strings. As a result, they can usually be used anywhere a normal Python string can be used.
There are two types of sourced strings: SimpleSourcedStrings, which correspond to a single substring of a document; and CompoundSourcedStrings, which are constructed by concatenating strings from multiple sources. Each of these types has two concrete subclasses: one for unicode strings (subclassed from ``unicode``), and one for byte strings (subclassed from ``str``).
Two sourced strings are considered equal if their contents are equal, even if their sources differ. This fact is important in ensuring that sourced strings act like normal strings. In particular, it allows sourced strings to be used with code that was originally intended to process plain Python strings.
If you wish to determine whether two sourced strings came from the
same location in the same document, simply compare their sources attributes. If you know that both sourced
strings are SimpleSourcedStrings, then you can
compare their source attribute instead.
String operations that act on sourced strings will preserve location information whenever possible. However, there are a few types of string manipulation that can cause source information to be discarded. The most common examples of operations that will lose source information are:
|
|||
| Splitting & Stripping Methods | |||
|---|---|---|---|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
| String Concatenation Methods | |||
|
|||
|
|||
|
|||
|
|||
|
|||
| Justification Methods | |||
|
|||
|
|||
|
|||
|
|||
| Replacement Methods | |||
|
|||
|
|||
|
|||
|
|||
| Unicode | |||
|
|||
|
|||
|
|||
|
|||
|
|||
| Display | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
| a new object with type S, a subtype of T |
|
||
| String Concatenation Methods | |||
|---|---|---|---|
|
|||
|
|||
|
|||
|
|||
_stringtype = NoneA class variable, defined by subclasses of SourcedString, determining what type of string this class contains. |
|||
| Splitting & Stripping Methods | |||
|---|---|---|---|
_WHITESPACE_RE = re.compile(r'\s
|
|||
_NEWLINE_RE = re.compile(r'\n')
|
|||
_LINE_RE = re.compile(r'.
|
|||
| Display | |||
_PPRINT_CHAR_REPRS =
|
|||
|
|||
|
sources A sorted tuple of (index, source) pairs.
|
|||
|
|||
|
Return a sourced string formed by concatenating the given list of substrings. Adjacent substrings will be merged when possible. Depending on the types and values of the supplied substrings, the
concatenated string's value may be a Python string ( |
Helper for concat(): add |
Helper for self.decode(). Returns a unicode-decoded version of
this SourcedString. This is used in the special case where the decoded string has the same length that the source string does. As a result, we can safely assume that each character is encoded with one byte; so we can just reuse our source. E.g., this will happen when decoding an ASCII string with utf-8.
Note: This method is abstract. |
Return true if the list (self,)+args contains at least one unicode string and at least one byte string. (If this is the case, then all byte strings should be converted to unicode by calling decode() before the operation is performed. You can do this automatically using _decode_and_call(). |
If self or any of the values in args is a byte string, then convert it
to unicode by calling its decode() method. Then return the result of
calling self.op(*args). |
Return a string containing a pretty-printed display of this sourced string.
|
|
|||
_stringtypeA class variable, defined by subclasses of SourcedString, determining what type of string this
class contains. Its value must be either
|
_PPRINT_CHAR_REPRS
|
|
|||
sourcesA sorted tuple of(index, source) pairs. Each such pair
specifies that the source of self[index:index+len(source)]
is source. Any characters for which no source is specified
are sourceless (e.g., plain Python characters that were concatenated to a
sourced string).
When working with simple sourced strings, it's usually easier to use
the |
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:51 2011 | http://epydoc.sourceforge.net |