datadings.reader.msgpack module

class datadings.reader.msgpack.MsgpackReader(path: str | Path, buffering=0)[source]

Bases: Reader

Reader for msgpack files in the datadings format description.

Needs at least data and index file. For example, if the dataset file is some_dir/dataset.msgpack, then the reader will attempt to load the index from some_dir/dataset.msgpack.index.

Can optionally verify the integrity of data and index files if the md5 file some_dir/dataset.msgpack.md5 is present.

Parameters:
  • path – Dataset file to load.

  • buffering – Read buffer size in bytes.

Raises:

IOError – If dataset or index cannot be loaded.

find_index(key)[source]

Returns the index of the sample with the given key.

find_key(index)[source]

Returns the key of the sample with the given index.

get(index, yield_key=False, raw=False, copy=True)[source]

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • index – Index of the sample

  • yield_key – If True, returns (key, sample)

  • raw – If True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Sample as index.

slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • start – start index of slice

  • stop – stop index of slice

  • yield_key – if True, yield (key, sample)

  • raw – if True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Iterator of selected samples

verify_data(read_size=524288, progress=False)[source]

Hash the dataset file and verify against the md5 file.

Parameters:
  • read_size – Read-ahead size in bytes.

  • progress – display progress

Returns:

True if verification was successful.

verify_index(read_size=524288, progress=False)[source]

Hash the index file and verify against the md5 file.

Parameters:
  • read_size – Read-ahead size in bytes.

  • progress – display progress

Returns:

True if verification was successful.