datadings.reader.augment module
An Augment wraps a
Reader <datadings.reader.reader.Reader
and changes how samples are iterated over.
How readers are used is largely unaffected.
- class datadings.reader.augment.Cycler(reader)[source]
Bases:
Augment
Infinitely cycle a
Reader <datadings.reader.reader.Reader
. Iterators can be requested with any start/stop index. Large indexes simply wrap around.- iter(start=None, stop=None, yield_key=False, raw=False, copy=True, chunk_size=16)[source]
Iterate over the dataset.
start
andstop
behave like the parameters of therange
function0.copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start of range; if None, current index is used
stop – stop of range
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
chunk_size – number of samples read at once; bigger values can increase throughput, but require more memory
- Returns:
Iterator
- class datadings.reader.augment.QuasiShuffler(reader, buf_size=0.01, seed=None)[source]
Bases:
Augment
A slightly less random than a true
Reader <datadings.reader.augment.Shuffler
but much faster.The dataset is divided into equal-size chunks that are read in random order. Shuffling follows these steps:
Fill the buffer with random chunks.
Read the next random chunk.
Select a random sample from the buffer and yield it.
Replace the sample with the next sample from the current chunk.
If there are chunks left, goto 2.
Shuffle the buffer and yield its contents.
This means there are typically more samples from the current chunk in the buffer than there would be if a true shuffle was used. This effect is more pronounced for smaller fractions \(\frac{B}{C}\) where \(C\) is the chunk size and \(B\) the buffer size. As a rule of thumb it is sufficient to keep \(\frac{B}{C}\) roughly equal to the number of classes in the dataset.
Note
Creating a new iterator, especially from a specific start position, is a costly operation. If possible create one iterator and use it until it is exhausted.
- Parameters:
reader – the reader to wrap
buf_size – size of the buffer; values less than 1 are interpreted as fractions of the dataset length; bigger values improve randomness, but use more memory
seed – random seed to use; defaults to
len(reader) * buf_size * chunk_size
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Range(reader, start=0, stop=None)[source]
Bases:
Augment
Extract a range of samples from a given reader.
start
andstop
behave like the parameters of the :python:`range` function.- Parameters:
reader – reader to sample from
start – start of range
stop – stop of range
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Repeater(reader, times)[source]
Bases:
Augment
Repeat a
Reader <datadings.reader.reader.Reader
a fixed number of times.Note
find_index
returns the first occurrence.- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Iterator of selected samples
- class datadings.reader.augment.Shuffler(reader, seed=None)[source]
Bases:
Augment
Iterate over a
Reader <datadings.reader.reader.Reader
in random order. If no seed is given the length of the reader is used for reproducibility. Creating an iterator increments the seed by 1. UseShuffler.seed()
to set the desired seed instead.Warning
Shuffler only implements iteration. Random access methods
find_index
,find_key
,get
, andslice
raiseNotImplementedError
.- Parameters:
reader – The reader to augment.
seed – optional random seed; defaults to len(reader)
Warning
Augments are not thread safe!
- get(index, yield_key=False, raw=False, copy=True)[source]
Returns sample at given index.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Sample as index.
- slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]
Returns a generator of samples selected by the given slice.
copy=False
allows the reader to use zero-copy mechanisms. Data may be returned asmemoryview
objects rather thanbytes
. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.- Parameters:
start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as
memoryview
objects instead ofbytes
- Returns:
Iterator of selected samples