| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
SeekableUnicodeStreamReader
A stream reader that automatically encodes the source byte stream into
unicode (like codecs.StreamReader); but still supports the
seek() and tell() operations correctly. This
is in contrast to codecs.StreamReader, which provide
*broken* seek() and tell() methods.
This class was motivated by StreamBackedCorpusView, which makes extensive use of
seek() and tell(), and needs to be able to
handle unicode-encoded files.
Note: this class requires stateless decoders. To my knowledge, this shouldn't cause a problem with any of python's builtin unicode encodings.
|
|||
|
|||
unicode
|
|
||
|
|||
list of unicode
|
|
||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
DEBUG = TrueIf true, then perform extra sanity checks. |
|||
_BOM_TABLE =
|
|||
|
|||
|
stream The underlying stream. |
|||
|
encoding The name of the encoding that should be used to encode the underlying stream. |
|||
|
errors The error mode that should be used when decoding data from the underlying stream. |
|||
|
decode The function that is used to decode byte strings into unicode strings. |
|||
|
bytebuffer A buffer to use bytes that have been read but have not yet been decoded. |
|||
|
linebuffer A buffer used by readline() to hold characters that have been read, but have not yet been returned by read() or readline(). |
|||
|
_rewind_checkpoint The file position at which the most recent read on the underlying stream began. |
|||
|
_rewind_numchars The number of characters that have been returned since the read that started at _rewind_checkpoint. |
|||
|
_bom The length of the byte order marker at the beginning of the stream (or None for no byte order marker).
|
|||
|
|||
|
closed True if the underlying stream is closed. |
|||
|
name The name of the underlying stream. |
|||
|
mode The mode of the underlying stream. |
|||
|
|||
|
Read up to
|
Read a line of text, decode it using this reader's encoding, and return the resulting unicode string.
|
Read this file's contents, decode them using this reader's encoding, and return it as a list of unicode lines.
|
Move the stream to a new file position. If the reader is maintaining any buffers, tehn they will be cleared.
|
Move the file position forward by
|
Return the current file position on the underlying byte stream. If this reader is maintaining any buffers, then the returned file position will be the position of the beginning of those buffers. |
Read up to |
Decode the given byte string into a unicode string, using this reader's encoding. If an exception is encountered that appears to be caused by a truncation error, then just decode the byte string without the bytes that cause the trunctaion error.
|
|
|||
_BOM_TABLE
|
|
|||
errorsThe error mode that should be used when decoding data from the underlying stream. Can be 'strict', 'ignore', or 'replace'. |
bytebufferA buffer to use bytes that have been read but have not yet been decoded. This is only used when the final bytes from a read do not form a complete encoding for a character. |
linebufferA buffer used by readline() to hold characters that have been read, but have not yet been returned by read() or readline(). This buffer consists of a list of unicode strings, where each string corresponds to a single line. The final element of the list may or may not be a complete line. Note that the existence of a linebuffer makes the tell() operation more complex, because it must backtrack to the beginning of the buffer to determine the correct file position in the underlying byte stream. |
_rewind_checkpointThe file position at which the most recent read on the underlying stream began. This is used, together with _rewind_numchars, to backtrack to the beginning of linebuffer (which is required by tell()). |
_rewind_numcharsThe number of characters that have been returned since the read that started at _rewind_checkpoint. This is used, together with _rewind_checkpoint, to backtrack to the beginning of linebuffer (which is required by tell()). |
|
|||
closedTrue if the underlying stream is closed. |
nameThe name of the underlying stream. |
modeThe mode of the underlying stream. |
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:46 2011 | http://epydoc.sourceforge.net |