datadings.reader.augment module¶
An Augment wraps a
Reader <datadings.reader.reader.Reader
and changes how samples are iterated over.
How readers are used is largely unaffected.
- class datadings.reader.augment.Cycler(reader)[source]¶
Bases:
datadings.reader.augment.Augment
Infinitely cycle a
Reader <datadings.reader.reader.Reader
.Warning
Augments are not thread safe!
- iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]¶
Create an iterator.
- Parameters
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but also memory
- Returns
Iterator
- class datadings.reader.augment.QuasiShuffler(reader, buf_size=0.01, chunk_size=16, seed=None)[source]¶
Bases:
datadings.reader.augment.Augment
A slightly less random than a true
Reader <datadings.reader.augment.Shuffler
but much faster.The dataset is divided into equal-size chunks that are read in random order. Shuffling follows these steps:
Fill the buffer with chunks.
Read the next chunk.
Select a random sample from the buffer and yield it.
Replace the sample with the next sample from the current chunk.
If there are chunks left, goto 2.
This means there are typically more samples from the current chunk in the buffer than there would be if a true shuffle was used. This effect is more pronounced for smaller fractions \(\frac{B}{C}\) where \(C\) is the chunk size and \(B\) the buffer size. As a rule of thumb it is sufficient to keep \(\frac{B}{C}\) roughly equal to the number of classes in the dataset.
Note
Seeking and resuming iteration with a new iterator are relatively costly operations. If possible create one iterator and use it repeatedly.
- Parameters
reader – the reader to wrap
buf_size – size of the buffer; values less than 1 are interpreted as fractions of the dataset length; bigger values improve randomness, but use more memory
chunk_size – size of each chunk; bigger values improve performance, but reduce randomness
seed – random seed to use; defaults to
len(reader) * self.buf_size * chunk_size
- iter(yield_key=False, raw=False, copy=True, chunk_size=None)[source]¶
Create an iterator.
- Parameters
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but also memory
- Returns
Iterator
- class datadings.reader.augment.Range(reader, start=0, stop=None)[source]¶
Bases:
datadings.reader.augment.Augment
Extract a range of samples from a given reader.
start
andstop
behave like the parameters of therange
function.- Parameters
reader – reader to sample from
start – start of range
stop – stop of range
- iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]¶
Create an iterator.
- Parameters
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but also memory
- Returns
Iterator
- class datadings.reader.augment.Repeater(reader, times)[source]¶
Bases:
datadings.reader.augment.Augment
Repeat a
Reader <datadings.reader.reader.Reader
a fixed number of times.Warning
Augments are not thread safe!
- iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]¶
Create an iterator.
- Parameters
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but also memory
- Returns
Iterator
- class datadings.reader.augment.Shuffler(reader, seed=None)[source]¶
Bases:
datadings.reader.augment.Augment
Iterate over a
Reader <datadings.reader.reader.Reader
in random order.- Parameters
reader – The reader to augment.
seed – optional random seed; defaults to len(reader)
Warning
Augments are not thread safe!
- iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]¶
Create an iterator.
- Parameters
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but also memory
- Returns
Iterator