datadings.reader.augment module

An Augment wraps a Reader <datadings.reader.reader.Reader and changes how samples are iterated over. How readers are used is largely unaffected.

class datadings.reader.augment.Cycler(reader)[source]

Bases: datadings.reader.augment.Augment

Infinitely cycle a Reader <datadings.reader.reader.Reader.

Warning

Augments are not thread safe!

iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Create an iterator.

Parameters
  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but also memory

Returns

Iterator

seek(index)[source]
class datadings.reader.augment.QuasiShuffler(reader, buf_size=0.01, chunk_size=16, seed=None)[source]

Bases: datadings.reader.augment.Augment

A slightly less random than a true Reader <datadings.reader.augment.Shuffler but much faster.

The dataset is divided into equal-size chunks that are read in random order. Shuffling follows these steps:

  1. Fill the buffer with chunks.

  2. Read the next chunk.

  3. Select a random sample from the buffer and yield it.

  4. Replace the sample with the next sample from the current chunk.

  5. If there are chunks left, goto 2.

This means there are typically more samples from the current chunk in the buffer than there would be if a true shuffle was used. This effect is more pronounced for smaller fractions \(\frac{B}{C}\) where \(C\) is the chunk size and \(B\) the buffer size. As a rule of thumb it is sufficient to keep \(\frac{B}{C}\) roughly equal to the number of classes in the dataset.

Note

Seeking and resuming iteration with a new iterator are relatively costly operations. If possible create one iterator and use it repeatedly.

Parameters
  • reader – the reader to wrap

  • buf_size – size of the buffer; values less than 1 are interpreted as fractions of the dataset length; bigger values improve randomness, but use more memory

  • chunk_size – size of each chunk; bigger values improve performance, but reduce randomness

  • seed – random seed to use; defaults to len(reader) * self.buf_size * chunk_size

iter(yield_key=False, raw=False, copy=True, chunk_size=None)[source]

Create an iterator.

Parameters
  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but also memory

Returns

Iterator

seek(index)[source]
class datadings.reader.augment.Range(reader, start=0, stop=None)[source]

Bases: datadings.reader.augment.Augment

Extract a range of samples from a given reader.

start and stop behave like the parameters of the range function.

Parameters
  • reader – reader to sample from

  • start – start of range

  • stop – stop of range

iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Create an iterator.

Parameters
  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but also memory

Returns

Iterator

seek(index)[source]
class datadings.reader.augment.Repeater(reader, times)[source]

Bases: datadings.reader.augment.Augment

Repeat a Reader <datadings.reader.reader.Reader a fixed number of times.

Warning

Augments are not thread safe!

iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Create an iterator.

Parameters
  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but also memory

Returns

Iterator

seek(index)[source]
class datadings.reader.augment.Shuffler(reader, seed=None)[source]

Bases: datadings.reader.augment.Augment

Iterate over a Reader <datadings.reader.reader.Reader in random order.

Parameters
  • reader – The reader to augment.

  • seed – optional random seed; defaults to len(reader)

Warning

Augments are not thread safe!

iter(yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Create an iterator.

Parameters
  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but also memory

Returns

Iterator

seek(index)[source]