datadings.reader.reader module

class datadings.reader.reader.Reader[source]

Bases: object

Abstract base class for dataset readers.

Readers should be used as context managers:

with Reader(...) as reader:
    for sample in reader:
        [do dataset things]

Subclasses must implement the following methods:

  • __exit__

  • __len__

  • __contains__

  • find_key

  • find_index

  • get

  • slice

abstract find_index(key)[source]

Returns the index of the sample with the given key.

abstract find_key(index)[source]

Returns the key of the sample with the given index.

abstract get(index, yield_key=False, raw=False, copy=True)[source]

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • index – Index of the sample

  • yield_key – If True, returns (key, sample)

  • raw – If True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Sample as index.

iter(start=None, stop=None, yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Iterate over the dataset.

start and stop behave like the parameters of the range function0.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • start – start of range; if None, current index is used

  • stop – stop of range

  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but require more memory

Returns:

Iterator

abstract slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • start – start index of slice

  • stop – stop index of slice

  • yield_key – if True, yield (key, sample)

  • raw – if True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Iterator of selected samples