datadings.reader.augment module
An Augment wraps a
Reader <datadings.reader.reader.Reader
and changes how samples are iterated over.
How readers are used is largely unaffected.
- class datadings.reader.augment.Cycler(reader)[source]
Bases:
AugmentInfinitely cycle a
Reader <datadings.reader.reader.Reader. Iterators can be requested with any start/stop index. Large indexes simply wrap around.- iter(start=None, stop=None, yield_key=False, raw=False, copy=True, chunk_size=16)[source]
Iterate over the dataset.
startandstopbehave like the parameters of therangefunction0.copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start of range; if None, current index is used
stop – stop of range
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbyteschunk_size – number of samples read at once; bigger values can increase throughput, but require more memory
- Returns:
Iterator
- class datadings.reader.augment.QuasiShuffler(reader, buf_size=0.01, seed=None)[source]
Bases:
AugmentA slightly less random than a true
Reader <datadings.reader.augment.Shufflerbut much faster.The dataset is divided into equal-size chunks that are read in random order. Shuffling follows these steps:
Fill the buffer with random chunks.
Read the next random chunk.
Select a random sample from the buffer and yield it.
Replace the sample with the next sample from the current chunk.
If there are chunks left, goto 2.
Shuffle the buffer and yield its contents.
This means there are typically more samples from the current chunk in the buffer than there would be if a true shuffle was used. This effect is more pronounced for smaller fractions \(\frac{B}{C}\) where \(C\) is the chunk size and \(B\) the buffer size. As a rule of thumb it is sufficient to keep \(\frac{B}{C}\) roughly equal to the number of classes in the dataset.
Note
Creating a new iterator, especially from a specific start position, is a costly operation. If possible create one iterator and use it until it is exhausted.
- Parameters:
reader – the reader to wrap
buf_size – size of the buffer; values less than 1 are interpreted as fractions of the dataset length; bigger values improve randomness, but use more memory
seed – random seed to use; defaults to
len(reader) * buf_size * chunk_size
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Range(reader, start=0, stop=None)[source]
Bases:
AugmentExtract a range of samples from a given reader.
startandstopbehave like the parameters of the :python:`range` function.- Parameters:
reader – reader to sample from
start – start of range
stop – stop of range
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Repeater(reader, times)[source]
Bases:
AugmentRepeat a
Reader <datadings.reader.reader.Readera fixed number of times.Note
find_indexreturns the first occurrence.- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Shuffler(reader, seed=None)[source]
Bases:
AugmentIterate over a
Reader <datadings.reader.reader.Readerin random order. If no seed is given the length of the reader is used for reproducibility. Creating an iterator increments the seed by 1. UseShuffler.seed()to set the desired seed instead.Warning
Shuffler only implements iteration. Random access methods
find_index,find_key,get, andsliceraiseNotImplementedError.- Parameters:
reader – The reader to augment.
seed – optional random seed; defaults to len(reader)
Warning
Augments are not thread safe!
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=Falseallows the reader to use zero-copy mechanisms. Data may be returned asmemoryviewobjects rather thanbytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryviewobjects instead ofbytes
- Returns:
Iterator of selected samples