datadings.reader.directory module

class datadings.reader.directory.DirectoryReader(patterns: ~typing.Sequence[str | ~pathlib.Path], labels: ~typing.Iterable | ~pathlib.Path = None, numeric_labels=True, initfun: ~typing.Callable = <function noop>, convertfun: ~typing.Callable = <function noop>, include: ~typing.Sequence[str] = (), exclude: ~typing.Sequence[str] = (), separator='\t', root_dir='')[source]

Bases: ListReader

Reader that loads samples from one or multiple filesystem directories.

One or more search patterns must be given to tell the reader where to look for samples. Each search pattern can either be:

  • A glob pattern to a filesystem directory. Use the special {LABEL} string to define which directory in the path to use as a label.

  • A path to a CSV-like file (with the given separator string) where each line contains the path to a sample file. Paths can be relative and optionally prefixed with a root_dir. A label as well as additional information can be included besides the path in additional columns. They will be stored as "label" and "_additional_info".

Example glob pattern: some_dir/{LABEL}/**

This patterns loads a dataset with a typical directory tree structure where samples from each class are located in separate subdirectories. The name of the directory at the level of {LABEL} is used as the label.

You can further narrow down which files to include with additional fnmatch.fnmatch() glob patterns. These are applied as follows:

  • If no inclusion patterns are given, all files are included.

  • If inclusion patterns are given, a file must match at least one.

  • A file is excluded if it matches any exclusion patterns.

Note

Please refer to the ListReader documentation for a more detailed explanation on how labels are handled.

Parameters:
  • patterns – One or more search patterns.

  • labels – Optional. List of labels in desired order, or path to file with one label per line. If None, get "label" keys from samples, if any, and sort.

  • numeric_labels – If true, convert labels to numeric index to list of all labels.

  • initfun – Callable initfun(sample: dict) to modify samples in-place during initialization.

  • convertfun – Callable convertfun(sample: dict) to modify samples in-place before they are returned.

  • include – Set of inclusion patterns.

  • exclude – Set of exclusion patterns.

  • separator – Separator string for file patterns.

  • root_dir – Prefix for relative paths.

datadings.reader.directory.check_included(filename, include, exclude)[source]
datadings.reader.directory.glob_pattern(pattern, prefix)[source]
datadings.reader.directory.yield_directory(patterns, separator)[source]
datadings.reader.directory.yield_file(infile, prefix, separator)[source]