Package nltk :: Module sourcedstring :: Class SimpleSourcedString
[hide private]
[frames] | no frames]

type SimpleSourcedString

source code

object --+        
         |        
basestring --+    
             |    
 SourcedString --+
                 |
                SimpleSourcedString
Known Subclasses:

A single substring of a document, annotated with information about the location in the document where it was originally found. See SourcedString for more information.

Instance Methods [hide private]
 
__init__(self, contents, source)
Construct a new sourced string.
source code
 
__repr__(self) source code
 
__getitem__(self, index) source code
 
__getslice__(self, start, stop) source code
 
capitalize(self) source code
 
lower(self) source code
 
upper(self) source code
 
swapcase(self) source code
 
title(self) source code
    Splitting & Stripping Methods

Inherited from SourcedString: lstrip, partition, rpartition, rsplit, rstrip, split, splitlines, strip

    String Concatenation Methods

Inherited from SourcedString: __add__, __mul__, __radd__, __rmul__, join

    Justification Methods

Inherited from SourcedString: center, ljust, rjust, zfill

    Replacement Methods

Inherited from SourcedString: __mod__, expandtabs, replace, translate

    Unicode
 
_decode_one_to_one(self, unicode_chars)
Helper for self.decode().
source code

Inherited from SourcedString: decode, encode

    Display

Inherited from SourcedString: pprint

Static Methods [hide private]
a new object with type S, a subtype of T
__new__(cls, contents, source) source code
    String Concatenation Methods

Inherited from SourcedString: concat

Class Variables [hide private]

Inherited from SourcedString (private): _stringtype

    Splitting & Stripping Methods

Inherited from SourcedString (private): _LINE_RE, _NEWLINE_RE, _WHITESPACE_RE

    Display

Inherited from SourcedString (private): _PPRINT_CHAR_REPRS

Instance Variables [hide private]
  source
A StringLocation specifying the location where this string occured in the source document.
Properties [hide private]
  begin
The document offset where the string begins.
  end
The document offset where the string ends.
  docid
An identifier (such as a filename) that specifies the document where the string was found.
  sources
A sorted tuple of (index, source) pairs.
Method Details [hide private]

__new__(cls, contents, source)
Static Method

source code 
Returns: a new object with type S, a subtype of T
Overrides: basestring.__new__
(inherited documentation)

__init__(self, contents, source)
(Constructor)

source code 

Construct a new sourced string.

Parameters:
  • contents (str or unicode) - The string contents of the new sourced string.
  • source - The source for the new string. If source is a string, then it is used to automatically construct a new ConsecutiveCharStringSource with a begin offset of 0 and an end offset of len(contents). Otherwise, source shoulde be a StringSource whose length matches the length of contents.
Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)

_decode_one_to_one(self, unicode_chars)

source code 

Helper for self.decode(). Returns a unicode-decoded version of this SourcedString. unicode_chars is the unicode-decoded contents of this SourcedString.

This is used in the special case where the decoded string has the same length that the source string does. As a result, we can safely assume that each character is encoded with one byte; so we can just reuse our source. E.g., this will happen when decoding an ASCII string with utf-8.

Overrides: SourcedString._decode_one_to_one
(inherited documentation)

Property Details [hide private]

begin

The document offset where the string begins. (I.e., the offset of the first character in the string.)

end

The document offset where the string ends. (For character offsets, one plus the offset of the last character; for byte offsets, one plus the offset of the last byte that encodes the last character).

docid

An identifier (such as a filename) that specifies the document where the string was found.

sources

A sorted tuple of (index, source) pairs. Each such pair specifies that the source of self[index:index+len(source)] is source. Any characters for which no source is specified are sourceless (e.g., plain Python characters that were concatenated to a sourced string).

When working with simple sourced strings, it's usually easier to use the source attribute instead; however, the sources attribute is defined for both simple and compound sourced strings.