Package nltk :: Module sourcedstring :: Class CompoundSourcedString
[hide private]
[frames] | no frames]

type CompoundSourcedString

source code

object --+        
         |        
basestring --+    
             |    
 SourcedString --+
                 |
                CompoundSourcedString
Known Subclasses:

A string constructed by concatenating substrings from multiple sources, and annotated with information about the locations where those substrings were originally found. See SourcedString for more information.

Instance Methods [hide private]
 
__init__(self, substrings)
Construct a new compound sourced string that combines the given list of substrings.
source code
 
__repr__(self) source code
 
_source_repr(self, substring) source code
 
__getitem__(self, index) source code
 
__getslice__(self, start, stop) source code
 
capitalize(self) source code
 
lower(self) source code
 
upper(self) source code
 
swapcase(self) source code
 
title(self) source code
    Splitting & Stripping Methods

Inherited from SourcedString: lstrip, partition, rpartition, rsplit, rstrip, split, splitlines, strip

    String Concatenation Methods

Inherited from SourcedString: __add__, __mul__, __radd__, __rmul__, join

    Justification Methods

Inherited from SourcedString: center, ljust, rjust, zfill

    Replacement Methods

Inherited from SourcedString: __mod__, expandtabs, replace, translate

    Unicode
 
encode(self, encoding=None, errors='strict') source code
 
_decode_one_to_one(self, unicode_chars)
Helper for self.decode().
source code

Inherited from SourcedString: decode

    Display

Inherited from SourcedString: pprint

Static Methods [hide private]
a new object with type S, a subtype of T
__new__(cls, substrings) source code
    String Concatenation Methods

Inherited from SourcedString: concat

Class Variables [hide private]

Inherited from SourcedString (private): _stringtype

    Splitting & Stripping Methods

Inherited from SourcedString (private): _LINE_RE, _NEWLINE_RE, _WHITESPACE_RE

    Display

Inherited from SourcedString (private): _PPRINT_CHAR_REPRS

Instance Variables [hide private]
  substrings
The tuple of substrings that compose this compound sourced string.
Properties [hide private]
  sources
A sorted tuple of (index, source) pairs.
Method Details [hide private]

__new__(cls, substrings)
Static Method

source code 
Returns: a new object with type S, a subtype of T
Overrides: basestring.__new__
(inherited documentation)

__init__(self, substrings)
(Constructor)

source code 

Construct a new compound sourced string that combines the given list of substrings.

Typically, compound sourced strings should not be constructed directly; instead, use SourcedString.concat(), which flattens nested compound sourced strings, and merges adjacent substrings when possible.

Raises:
  • ValueError - If len(substrings) < 2
  • ValueError - If substrings contains any CompoundSourcedStrings.
Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)

encode(self, encoding=None, errors='strict')

source code 
Overrides: SourcedString.encode

_decode_one_to_one(self, unicode_chars)

source code 

Helper for self.decode(). Returns a unicode-decoded version of this SourcedString. unicode_chars is the unicode-decoded contents of this SourcedString.

This is used in the special case where the decoded string has the same length that the source string does. As a result, we can safely assume that each character is encoded with one byte; so we can just reuse our source. E.g., this will happen when decoding an ASCII string with utf-8.

Overrides: SourcedString._decode_one_to_one
(inherited documentation)

Instance Variable Details [hide private]

substrings

The tuple of substrings that compose this compound sourced string. Every compound sourced string is required to have at least two substrings; and the substrings themselves may never be CompoundSourcedStrings.

Property Details [hide private]

sources

A sorted tuple of (index, source) pairs. Each such pair specifies that the source of self[index:index+len(source)] is source. Any characters for which no source is specified are sourceless (e.g., plain Python characters that were concatenated to a sourced string).

When working with simple sourced strings, it's usually easier to use the source attribute instead; however, the sources attribute is defined for both simple and compound sourced strings.