type PickleCorpusView
source code
object --+
|
util.AbstractLazySequence --+
|
StreamBackedCorpusView --+
|
PickleCorpusView
A stream backed corpus view for corpus files that consist of sequences
of serialized Python objects (serialized using pickle.dump).
One use case for this class is to store the result of running feature
detection on a corpus to disk. This can be useful when performing
feature detection is expensive (so we don't want to repeat it); but the
corpus is too large to store in memory. The following example
illustrates this technique:
>>> feature_corpus = LazyMap(detect_features, corpus)
>>> PickleCorpusView.write(feature_corpus, some_fileid)
>>> pcv = PickledCorpusView(some_fileid)
|
|
__init__(self,
fileid,
delete_on_gc=False)
Create a new corpus view that reads the pickle corpus
fileid. |
source code
|
|
|
list of any
|
|
|
|
__del__(self)
If delete_on_gc was set to true when this
PickleCorpusView was created, then delete the corpus
view's fileid. |
source code
|
|
|
Inherited from StreamBackedCorpusView:
__add__,
__getitem__,
__len__,
__mul__,
__radd__,
__rmul__,
close,
iterate_from
Inherited from util.AbstractLazySequence:
__cmp__,
__contains__,
__hash__,
__iter__,
__repr__,
count,
index
|
|
|
|
|
|
cache_to_tempfile(cls,
sequence,
delete_on_gc=True)
Write the given sequence to a temporary file as a pickle corpus; and
then return a PickleCorpusView view for that temporary
corpus file. |
source code
|
|
|
|
BLOCK_SIZE = 100
|
|
|
PROTOCOL = -1
|
|
|
__init__(self,
fileid,
delete_on_gc=False)
(Constructor)
| source code
|
Create a new corpus view that reads the pickle corpus
fileid.
- Parameters:
delete_on_gc - If true, then fileid will be deleted whenever this
object gets garbage-collected.
- Overrides:
StreamBackedCorpusView.__init__
|
|
Read a block from the input stream.
- Parameters:
- Returns: list of any
- a block of tokens from the input stream
- Overrides:
StreamBackedCorpusView.read_block
- (inherited documentation)
|
|
If delete_on_gc was set to true when this
PickleCorpusView was created, then delete the corpus view's
fileid. (This method is called whenever a PickledCorpusView
is garbage-collected.
|
cache_to_tempfile(cls,
sequence,
delete_on_gc=True)
Class Method
| source code
|
Write the given sequence to a temporary file as a pickle corpus; and
then return a PickleCorpusView view for that temporary
corpus file.
- Parameters:
delete_on_gc - If true, then the temporary file will be deleted whenever this
object gets garbage-collected.
|