Package nltk :: Module sourcedstring :: Class ContiguousCharStringSource
[hide private]
[frames] | no frames]

type ContiguousCharStringSource

source code

  object --+    
           |    
StringSource --+
               |
              ContiguousCharStringSource

A StringSource that specifies the source of strings whose character are contiguous, but do not necessarily have consecutive offsets. In particular, each character's end offset must be equal to the next character's start offset:

This property allow the source to be stored using a list of len(source)+1 offsets (along with a docid).

This StringSource can be used to describe unicode strings that are indexed using byte offsets.

Instance Methods [hide private]
 
__init__(self, docid, offsets)
Create a new StringSource.
source code
 
__len__(self)
Return the length of the string described by this StringSource.
source code
 
__getslice__(self, start, stop)
Return a StringSource describing the location where the specified substring was found.
source code
 
__cmp__(self, other) source code
 
__repr__(self) source code

Inherited from StringSource: __getitem__, __hash__, __str__

Static Methods [hide private]

Inherited from StringSource: __new__

Class Variables [hide private]
  CONSTRUCTOR_CHECKS_OFFSETS = False
Instance Variables [hide private]

Inherited from StringSource: docid, offsets

Properties [hide private]
  begin
The document offset where the string begins.
  end
The document offset where the string ends.
Method Details [hide private]

__init__(self, docid, offsets)
(Constructor)

source code 

Create a new StringSource. When the StringSource constructor is called directly, it automatically delegates to one of its two subclasses:

In both cases, the arguments must be specified as keyword arguments (not positional arguments).

Overrides: StringSource.__init__
(inherited documentation)

__len__(self)
(Length operator)

source code 

Return the length of the string described by this StringSource. Note that this may not be equal to self.end-self.begin for unicode strings described using byte offsets.

Overrides: StringSource.__len__
(inherited documentation)

__getslice__(self, start, stop)
(Slicling operator)

source code 

Return a StringSource describing the location where the specified substring was found. In particular, if s is the string that this source describes, then return a StringSource describing the location of s[start:stop].

Overrides: StringSource.__getslice__
(inherited documentation)

__cmp__(self, other)
(Comparison operator)

source code 
Overrides: StringSource.__cmp__

__repr__(self)
(Representation operator)

source code 
Overrides: object.__repr__
(inherited documentation)

Property Details [hide private]

begin

The document offset where the string begins. (I.e., the offset of the first character in the string.) source.begin is always equal to source.offsets[0].

end

The document offset where the string ends. (For character offsets, one plus the offset of the last character; for byte offsets, one plus the offset of the last byte that encodes the last character). source.end is always equal to source.offsets[-1].