datadings.reader.reader module

class datadings.reader.reader.Reader[source]

Bases: object

Abstract base class for dataset readers.

Readers should be used as context managers:

with Reader(...) as reader:
    for sample in reader:
        [do dataset things]

Subclasses must implement the following methods:

  • __exit__

  • __len__

  • __contains__

  • find_key

  • find_index

  • get

  • slice

abstract find_index(key)[source]

Returns the index of the sample with the given key.

abstract find_key(index)[source]

Returns the key of the sample with the given index.

abstract get(index, yield_key=False, raw=False, copy=True)[source]

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters
  • index – Index of the sample

  • yield_key – If True, returns (key, sample)

  • raw – If True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns

Sample as index.

iter(start=None, stop=None, yield_key=False, raw=False, copy=True, chunk_size=16)[source]

Iterate over the dataset.

start and stop behave like the parameters of the range function0.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters
  • start – start of range; if None, current index is used

  • stop – stop of range

  • yield_key – if True, yields (key, sample) pairs.

  • raw – if True, yields samples as msgpacked messages.

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

  • chunk_size – number of samples read at once; bigger values can increase throughput, but require more memory

Returns

Iterator

next()[source]

Returns the next sample.

This can be slow for file-based readers if a lot of samples are to be read. Consider using iter instead:

it = iter(reader)
while 1:
    next(it)
    ...

Or simply loop over the reader:

for sample in reader:
    ...
rawiter(yield_key=False)[source]

Create an iterator that yields samples as msgpacked messages.

Included for backwards compatibility and may be deprecated and subsequently removed in the future.

Parameters

yield_key – If True, yields (key, sample) pairs.

Returns

Iterator

rawnext()bytes[source]

Return the next sample msgpacked as raw bytes.

This can be slow for file-based readers if a lot of samples are to be read. Consider using iter instead:

it = iter(reader)
while 1:
    next(it)
    ...

Or simply loop over the reader:

for sample in reader:
    ...

Included for backwards compatibility and may be deprecated and subsequently removed in the future.

seek(index)[source]

Seek to the given index. Alias for seek_index.

seek_index(index)[source]

Seek to the given index.

seek_key(key)[source]

Seek to the sample with the given key.

abstract slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters
  • start – start index of slice

  • stop – stop index of slice

  • yield_key – if True, yield (key, sample)

  • raw – if True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns

Iterator of selected samples