Package nltk :: Module sourcedstring :: Class StringSource
[hide private]
[frames] | no frames]

type StringSource

source code

object --+
         |
        StringSource
Known Subclasses:

A description of the location of a string in a document. Each StringSource consists of a document identifier, along with information about the begin and end offsets of each character in the string. These offsets are typically either byte offsets or character offsets. (Note that for unicode strings, byte offsets and character offsets are not the same thing.)

StringSource is an abstract base class. Two concrete subclasses are used depending on the properties of the string whose source is being described:

Instance Methods [hide private]
 
__init__(self, docid, **kwargs)
Create a new StringSource.
source code
 
__getitem__(self, index)
Return a StringSource describing the location where the specified character was found.
source code
 
__getslice__(self, start, stop)
Return a StringSource describing the location where the specified substring was found.
source code
 
__len__(self)
Return the length of the string described by this StringSource.
source code
 
__str__(self) source code
 
__cmp__(self, other) source code
 
__hash__(self) source code
Static Methods [hide private]
 
__new__(cls, docid, *args, **kwargs) source code
Instance Variables [hide private]
  begin
The document offset where the string begins.
  docid
An identifier (such as a filename) that specifies which document contains the string.
  end
The document offset where the string ends.
  offsets
A list of offsets specifying the location of each character in the document.
Method Details [hide private]

__new__(cls, docid, *args, **kwargs)
Static Method

source code 
Overrides: object.__new__
(inherited documentation)

__init__(self, docid, **kwargs)
(Constructor)

source code 

Create a new StringSource. When the StringSource constructor is called directly, it automatically delegates to one of its two subclasses:

In both cases, the arguments must be specified as keyword arguments (not positional arguments).

Overrides: object.__init__

__getitem__(self, index)
(Indexing operator)

source code 

Return a StringSource describing the location where the specified character was found. In particular, if s is the string that this source describes, then return a StringSource describing the location of s[index].

Raises:
  • IndexError - If index is out of range.

__getslice__(self, start, stop)
(Slicling operator)

source code 

Return a StringSource describing the location where the specified substring was found. In particular, if s is the string that this source describes, then return a StringSource describing the location of s[start:stop].

Decorators:
  • @abstract

Note: This method is abstract.

__len__(self)
(Length operator)

source code 

Return the length of the string described by this StringSource. Note that this may not be equal to self.end-self.begin for unicode strings described using byte offsets.

Decorators:
  • @abstract

Note: This method is abstract.

__str__(self)
(Informal representation operator)

source code 
Overrides: object.__str__
(inherited documentation)

__hash__(self)
(Hashing function)

source code 
Overrides: object.__hash__
(inherited documentation)

Instance Variable Details [hide private]

begin

The document offset where the string begins. (I.e., the offset of the first character in the string.) source.begin is always equal to source.offsets[0].

end

The document offset where the string ends. (For character offsets, one plus the offset of the last character; for byte offsets, one plus the offset of the last byte that encodes the last character). source.end is always equal to source.offsets[-1].

offsets

A list of offsets specifying the location of each character in the document. The ith character of the string begins at offset offsets[i] and ends at offset offsets[i+1]. The length of the offsets list is one greater than the list of the string described by this StringSource.