datadings.tools package¶
- class datadings.tools.Yielder(gen, queue, end, error)[source]¶
Bases:
threading.Thread
- run()[source]¶
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- datadings.tools.document_keys(typefun, block='Important:', prefix='Samples have the following keys:', postfix='')[source]¶
Extract the keys that samples created by a type function have create a documentation string that lists them. For example, it produces the following documentation for
ImageClassificationData
:{block} {prefix} - ``"key"`` - ``"image"`` - ``"label"`` {postfix}
- Parameters
typefun – Type function to analyze.
block – Type of block to use. Defaults to “Important:”.
prefix – Text before parameter list.
postfix – Text after parameter list.
- datadings.tools.download_files_if_not_found(files, indir)[source]¶
Run :py:func:
download_if_not_found
for multiple files.See also
- datadings.tools.download_if_not_found(url, path)[source]¶
Check if
path
is a file, otherwise download fromurl
topath
.
- datadings.tools.hash_md5hex(path, read_size=65536, progress=False)[source]¶
Calculate the (hexadecimal) MD5 hash of a file.
- Parameters
path – File to hash.
read_size – Read-ahead size.
progress – If True, display progress.
- Returns
Hexadecimal MD5 hash as string.
- datadings.tools.hash_string(s: str, salt: bytes = b'', __struct=<Struct object>) → int[source]¶
Hash a string using the blake2s algorithm.
- Parameters
s – the string
salt – optional salt, max 8 bytes
- Returns
first 8 bytes of the hash, interpreted as big-endian uint64
- datadings.tools.hash_string_bytes(s: str, salt: bytes = b'', __struct=<Struct object>) → bytes[source]¶
Hash a string using the blake2s algorithm.
- Parameters
s – the string
salt – optional salt, max 8 bytes
- Returns
first 8 bytes of the hash
- datadings.tools.load_md5file(path)[source]¶
Load a text files of MD5 hashes.
- Parameters
path – Path to MD5 file.
- Returns
Dict of (file, hash) pairs.
- datadings.tools.locate_files(files, indir)[source]¶
Returns a copy of
files
where paths are replaced with concrete paths located inindir
.See also
- datadings.tools.make_printer(bar_format='{desc} {percentage:3.0f}% {elapsed}<{remaining}, {rate_fmt}{postfix}', miniters=0, mininterval=0.5, smoothing=0.1, **kwargs)[source]¶
Convenience function to create tqdm objects with some default arguments.
- Returns
tqdm.tqdm object.
- datadings.tools.path_append(path: pathlib.Path, string: str)[source]¶
Append a string to the name of a pathlib Path.
- Parameters
path – the path
string – the bit to append
- Returns
Path with stuff appended
- Raises
e.g., root /. –
- datadings.tools.path_append_suffix(path: pathlib.Path, suffix: str)[source]¶
Appends the given suffix to the path if the path does not end with said suffix:
>>> path_append_suffix(Path('some.file'), '.file') >>> Path('some.file') >>> path_append_suffix(Path('some.file'), '.txt') >>> Path('some.file.txt')
Behaves like
path_append
if suffix does not startwith'.'
(dot):>>> path_append_suffix(Path('some.file'), 'txt') >>> Path('some.filetxt')
- Parameters
path – the base path
suffix – suffix to append if necessary
- Returns
Path that ends with suffix.
- datadings.tools.prepare_indir(files, args)[source]¶
Prepare a directory for dataset creation.
files
specifies with files need be downloaded and/or integrity checked. It is a dict of file descriptions like these:files = { 'train': { 'path': 'dataset.zip', 'url': 'http://cool.dataset/dataset.zip', 'md5': '56ad5c77e6c8f72ed9ef2901628d6e48', } }
Once downloads and/or verification have finished, the relative paths are replaced with concrete paths in
args.indir
.- Parameters
files – Dict of file descriptions.
args – Parsed argparse arguments object with
indir
andskip_verification
arguments.
- Returns
Files with paths located in args.indir.
- datadings.tools.print_over(*args, **kwargs)[source]¶
Wrapper around print that replaces the current line. It prints from the start of the line and clears remaining characters. Accepts the same kwargs as the print function.
- Parameters
flush – If True, flush after printing.
- datadings.tools.query_user(question, default='yes', answers=('yes', 'no', 'abort'))[source]¶
Ask user a question via input() and return their answer.
Adapted from http://code.activestate.com/recipes/577097/
- Parameters
question – String that is presented to the user.
default – Presumed answer if the user just hits <Enter>. Must be one of
prompts
orNone
(meaning an answer is required of the user).answers – Answers the user can give.
- Returns
One of
prompts
.
- datadings.tools.split_array(img, v_pixels, h_pixels, indices=(1, 2))[source]¶
Split/tile an image/numpy array in horizontal and vertical direction.
- Parameters
img – The image to split.
h_pixels – Width of each tile in pixels.
v_pixels – Height of each tile in pixels.
indices – 2-tuple of indices used to calculate number of tiles.
- Returns
Yields single tiles from the image as arrays.
- datadings.tools.tiff_to_nd_array(file_path, type=<class 'numpy.uint8'>)[source]¶
Decode a TIFF image and returns all contained subimages as numpy array. The first dimension of the array indexes the subimages.
Warning
Requires geo (GDAL) extra!
- Parameters
file_path – Path to TIFF file.
type – Output dtype.
- Returns
TIFF image as numpy array.
- datadings.tools.verify_files(files, indir)[source]¶
Verify the integrity of the given files.
See also