datadings.reader.list module

class datadings.reader.list.ListReader(samples: ~typing.Sequence[dict], labels: ~typing.Iterable | ~pathlib.Path = None, numeric_labels=True, initfun: ~typing.Callable = <function noop>, convertfun: ~typing.Callable = <function noop>)[source]

Bases: Reader

Reader that holds a list of samples. Functions can be given to load data on the fly and/or perform further conversion steps.

Two special keys "key" and "label" in samples are used:

  • "key" is a unique identifier for samples. Sample index is added to samples if it is missing.

  • "label" holds an optional label. Replaced by a numeric index to the list of labels if numeric_labels is true. The original label is retained as "_label".

Note

If labels argument is not given, the list of all labels will be extracted from all samples. The list of all labels is natsorted to determine numerical labels.

Note

initfun is applied to the given samples during initialization and thus remain for the life of the reader. convertfun is applied to a shallow copy of the sample every time before it is returned.

Important

Since None is not sortable, the labels argument must be given to use None as a label.

Parameters:
  • samples – Sequence of samples. Must be indexable, so no generators or one-time iterators.

  • labels – Optional. List of labels in desired order, or path to file with one label per line. If None, get "label" keys from samples, if any, and sort.

  • numeric_labels – If true, convert labels to numeric index to list of all labels.

  • initfun – Callable initfun(sample: dict) to modify samples in-place during initialization.

  • convertfun – Callable convertfun(sample: dict). Applied to shallow copies of samples before they are returned.

  • convertfun – Callable convertfun(sample: dict) to modify a shallow copy of samples in-place before they are returned.

find_index(key)[source]

Returns the index of the sample with the given key.

find_key(index)[source]

Returns the key of the sample with the given index.

get(index, yield_key=False, raw=False, copy=True)[source]

Returns sample at given index.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • index – Index of the sample

  • yield_key – If True, returns (key, sample)

  • raw – If True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Sample as index.

slice(start, stop=None, yield_key=False, raw=False, copy=True)[source]

Returns a generator of samples selected by the given slice.

copy=False allows the reader to use zero-copy mechanisms. Data may be returned as memoryview objects rather than bytes. This can improve performance, but also drastically increase memory consumption, since one sample can keep the whole slice in memory.

Parameters:
  • start – start index of slice

  • stop – stop index of slice

  • yield_key – if True, yield (key, sample)

  • raw – if True, returns sample as msgpacked message

  • copy – if False, allow the reader to return data as memoryview objects instead of bytes

Returns:

Iterator of selected samples

datadings.reader.list.load_lines(path)[source]
datadings.reader.list.noop(_)[source]
datadings.reader.list.sorted_labels(samples)[source]