Package nltk :: Module data :: Class BufferedGzipFile
[hide private]
[frames] | no frames]

classobj_type BufferedGzipFile

source code

gzip.GzipFile --+
                |
               BufferedGzipFile

A GzipFile subclass that buffers calls to read() and write(). This allows faster reads and writes of data to and from gzip-compressed files at the cost of using more memory.

The default buffer size is 2mb.

BufferedGzipFile is useful for loading large gzipped pickle objects as well as writing large encoded feature files for classifier training.

Instance Methods [hide private]
BufferedGzipFile
__init__(self, filename=None, mode=None, compresslevel=9, fileobj=None, **kwargs)
Returns: a buffered gzip file object
source code
 
_reset_buffer(self) source code
 
_write_buffer(self, data) source code
 
_write_gzip(self, data) source code
 
close(self) source code
 
flush(self, lib_mode=2) source code
 
read(self, size=None) source code
 
write(self, data, size=-1) source code

Inherited from gzip.GzipFile: __del__, __iter__, __repr__, fileno, isatty, next, readline, readlines, rewind, seek, tell, writelines

Inherited from gzip.GzipFile (private): _add_read_data, _init_read, _init_write, _read, _read_eof, _read_gzip_header, _unread, _write_gzip_header

Class Variables [hide private]
  SIZE = 2097152

Inherited from gzip.GzipFile: max_read_chunk, myfileobj

Method Details [hide private]

__init__(self, filename=None, mode=None, compresslevel=9, fileobj=None, **kwargs)
(Constructor)

source code 

Constructor for the GzipFile class.

At least one of fileobj and filename must be given a non-trivial value.

The new class instance is based on fileobj, which can be a regular file, a StringIO object, or any other object which simulates a file. It defaults to None, in which case filename is opened to provide a file object.

When fileobj is not None, the filename argument is only used to be included in the gzip file header, which may includes the original filename of the uncompressed file. It defaults to the filename of fileobj, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header.

The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', or 'wb', depending on whether the file will be read or written. The default is the mode of fileobj if discernible; otherwise, the default is 'rb'. Be aware that only the 'rb', 'ab', and 'wb' values should be used for cross-platform portability.

The compresslevel argument is an integer from 1 to 9 controlling the level of compression; 1 is fastest and produces the least compression, and 9 is slowest and produces the most compression. The default is 9.

Parameters:
  • filename (str) - a filesystem path
  • mode (str) - a file mode which can be any of 'r', 'rb', 'a', 'ab', 'w', or 'wb'
  • compresslevel (int) - The compresslevel argument is an integer from 1 to 9 controlling the level of compression; 1 is fastest and produces the least compression, and 9 is slowest and produces the most compression. The default is 9.
  • fileobj (StringIO) - a StringIO stream to read from instead of a file.
  • size (int) - number of bytes to buffer during calls to read() and write()
Returns: BufferedGzipFile
a buffered gzip file object
Overrides: gzip.GzipFile.__init__

close(self)

source code 
Overrides: gzip.GzipFile.close

flush(self, lib_mode=2)

source code 
Overrides: gzip.GzipFile.flush

read(self, size=None)

source code 
Overrides: gzip.GzipFile.read

write(self, data, size=-1)

source code 
Parameters:
  • data (str) - str to write to file or buffer
  • size (int) - buffer at least size bytes before writing to file
Overrides: gzip.GzipFile.write