datadings.reader.reader module¶

class datadings.reader.reader.Reader[source]¶

Bases: object

Abstract base class for dataset readers.

Readers should be used as context managers:

with Reader(...) as reader:
    for sample in reader:
        [do dataset things]

Subclasses must implement the following methods:

__exit__
__len__
__contains__
find_key
find_index
get
slice

abstract find_index(key)[source]¶: Returns the index of the sample with the given key.

abstract find_key(index)[source]¶: Returns the key of the sample with the given index.

abstract get(index, yield_key=False, raw=False, copy=True)[source]¶

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters

index – Index of the sample
yield_key – If True, returns (key, sample)
raw – If True, returns sample as msgpacked message
copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns

Sample as index.

iter(start=None, stop=None, yield_key=False, raw=False, copy=True, chunk_size=16)[source]¶

Iterate over the dataset.

start and stop behave like the parameters of the range function0.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters

start – start of range; if None, current index is used
stop – stop of range
yield_key – if True, yields (key, sample) pairs.
raw – if True, yields samples as msgpacked messages.
copy – if False, allow the reader to return data as memoryview objects instead of bytes
chunk_size – number of samples read at once; bigger values can increase throughput, but require more memory

Returns

Iterator

next()[source]¶

Returns the next sample.

This can be slow for file-based readers if a lot of samples are to be read. Consider using iter instead:

it = iter(reader)
while 1:
    next(it)
    ...

Or simply loop over the reader:

for sample in reader:
    ...

rawiter(yield_key=False)[source]¶

Create an iterator that yields samples as msgpacked messages.

Included for backwards compatibility and may be deprecated and subsequently removed in the future.

Parameters: yield_key – If True, yields (key, sample) pairs.
Returns: Iterator

rawnext() → bytes [source]¶

Return the next sample msgpacked as raw bytes.

This can be slow for file-based readers if a lot of samples are to be read. Consider using iter instead:

it = iter(reader)
while 1:
    next(it)
    ...

Or simply loop over the reader:

for sample in reader:
    ...

Included for backwards compatibility and may be deprecated and subsequently removed in the future.

seek(index)[source]¶: Seek to the given index. Alias for seek_index.

seek_index(index)[source]¶: Seek to the given index.

seek_key(key)[source]¶: Seek to the sample with the given key.

abstract slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]¶

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters

start – start index of slice
stop – stop index of slice
yield_key – if True, yield (key, sample)
raw – if True, returns sample as msgpacked message
copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns

Iterator of selected samples