datadings.reader.directory module¶
- class datadings.reader.directory.DirectoryReader(patterns: Sequence[Union[str, pathlib.Path]], labels: Optional[Union[Iterable, pathlib.Path]] = None, numeric_labels=True, initfun: Callable = <function noop>, convertfun: Callable = <function noop>, include: Sequence[str] = (), exclude: Sequence[str] = (), separator='\t', root_dir='')[source]¶
Bases:
datadings.reader.list.ListReader
Reader that loads samples from one or multiple filesystem directories.
One or more search patterns must be given to tell the reader where to look for samples. Each search pattern can either be:
A glob pattern to a filesystem directory. Use the special
{LABEL}
string to define which directory in the path to use as a label.A path to a CSV-like file (with the given
separator
string) where each line contains the path to a sample file. Paths can be relative and optionally prefixed with aroot_dir
. A label as well as additional information can be included besides the path in additional columns. They will be stored as"label"
and"_additional_info"
.
Example glob pattern:
some_dir/{LABEL}/**
This patterns loads a dataset with a typical directory tree structure where samples from each class are located in separate subdirectories. The name of the directory at the level of
{LABEL}
is used as the label.You can further narrow down which files to include with additional
fnmatch.fnmatch()
glob patterns. These are applied as follows:If no inclusion patterns are given, all files are included.
If inclusion patterns are given, a file must match at least one.
A file is excluded if it matches any exclusion patterns.
Note
Please refer to the
ListReader
documentation for a more detailed explanation on how labels are handled.- Parameters
patterns – One or more search patterns.
labels – Optional. List of labels in desired order, or path to file with one label per line. If
None
, get"label"
keys from samples, if any, and sort.numeric_labels – If true, convert labels to numeric index to list of all labels.
initfun – Callable
initfun(sample: dict)
to modify samples in-place during initialization.convertfun – Callable
convertfun(sample: dict)
to modify samples in-place before they are returned.include – Set of inclusion patterns.
exclude – Set of exclusion patterns.
separator – Separator string for file patterns.
root_dir – Prefix for relative paths.