datadings.reader.sharded module

class datadings.reader.sharded.ShardedReader(shards: str | Path | Iterable[str | Path | Reader])[source]

Bases: Reader

A Reader that combines several shards into one. Shards can be specified either as a glob pattern dir/*.msgpack for msgpack files, or an iterable of individual shards. Each shard can be a string, Path, or Reader.

Parameters:

shards – glob pattern or a list of strings, Path objects or Readers

find_index(key)[source]

Returns the index of the sample with the given key.

find_key(index)[source]

Returns the key of the sample with the given index.

get(index, yield_key=False, raw=False, copy=True)[source]

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • index – Index of the sample

  • yield_key – If True, returns (key, sample)

  • raw – If True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Sample as index.

slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • start – start index of slice

  • stop – stop index of slice

  • yield_key – if True, yield (key, sample)

  • raw – if True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Iterator of selected samples