datadings.tools package

class datadings.tools.ProgressPrinter(*_, **__)[source]

Bases: tqdm

monitor_interval = 0
class datadings.tools.SentinelEnd[source]

Bases: object

class datadings.tools.SentinelError[source]

Bases: object

class datadings.tools.Yielder(gen, queue, end, error)[source]

Bases: Thread

run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

stop()[source]
class datadings.tools.YielderProc(gen, queue)[source]

Bases: Process

run()[source]

Method to be run in sub-process; can be overridden in sub-class

stop()[source]
datadings.tools.document_keys(typefun, block='Important:', prefix='Samples have the following keys:', postfix='')[source]

Extract the keys that samples created by a type function have create a documentation string that lists them. For example, it produces the following documentation for ImageClassificationData:

{block}
    {prefix}

    - ``"key"``
    - ``"image"``
    - ``"label"``

    {postfix}
Parameters:
  • typefun – Type function to analyze.

  • block – Type of block to use. Defaults to “Important:”.

  • prefix – Text before parameter list.

  • postfix – Text after parameter list.

datadings.tools.download_files_if_not_found(files, indir)[source]

Run :py:func:download_if_not_found for multiple files.

datadings.tools.download_if_not_found(url, path)[source]

Check if path is a file, otherwise download from url to path.

datadings.tools.hash_md5hex(path, read_size=65536, progress=False)[source]

Calculate the (hexadecimal) MD5 hash of a file.

Parameters:
  • path – File to hash.

  • read_size – Read-ahead size.

  • progress – If True, display progress.

Returns:

Hexadecimal MD5 hash as string.

datadings.tools.hash_string(s: str, salt: bytes = b'', __struct=<_struct.Struct object>) int[source]

Hash a string using the blake2s algorithm.

Parameters:
  • s – the string

  • salt – optional salt, max 8 bytes

Returns:

first 8 bytes of the hash, interpreted as big-endian uint64

datadings.tools.hash_string_bytes(s: str, salt: bytes = b'', __struct=<_struct.Struct object>) bytes[source]

Hash a string using the blake2s algorithm.

Parameters:
  • s – the string

  • salt – optional salt, max 8 bytes

Returns:

first 8 bytes of the hash

datadings.tools.load_md5file(path)[source]

Load a text files of MD5 hashes.

Parameters:

path – Path to MD5 file.

Returns:

Dict of (file, hash) pairs.

datadings.tools.locate_files(files, indir)[source]

Returns a copy of files where paths are replaced with concrete paths located in indir.

datadings.tools.make_printer(bar_format='{desc} {percentage:3.0f}% {elapsed}<{remaining}, {rate_fmt}{postfix}', miniters=0, mininterval=0.5, smoothing=0.1, **kwargs)[source]

Convenience function to create tqdm objects with some default arguments.

Returns:

tqdm.tqdm object.

datadings.tools.path_append(path: Path, string: str)[source]

Append a string to the name of a pathlib Path.

Parameters:
  • path – the path

  • string – the bit to append

Returns:

Path with stuff appended

Raises:
datadings.tools.path_append_suffix(path: Path, suffix: str)[source]

Appends the given suffix to the path if the path does not end with said suffix:

>>> path_append_suffix(Path('some.file'), '.file')
>>> Path('some.file')
>>> path_append_suffix(Path('some.file'), '.txt')
>>> Path('some.file.txt')

Behaves like path_append if suffix does not startwith '.' (dot):

>>> path_append_suffix(Path('some.file'), 'txt')
>>> Path('some.filetxt')
Parameters:
  • path – the base path

  • suffix – suffix to append if necessary

Returns:

Path that ends with suffix.

datadings.tools.prepare_indir(files, args)[source]

Prepare a directory for dataset creation. files specifies with files need be downloaded and/or integrity checked. It is a dict of file descriptions like these:

files = {
    'train': {
        'path': 'dataset.zip',
        'url': 'http://cool.dataset/dataset.zip',
        'md5': '56ad5c77e6c8f72ed9ef2901628d6e48',
    }
}

Once downloads and/or verification have finished, the relative paths are replaced with concrete paths in args.indir.

Parameters:
  • files – Dict of file descriptions.

  • args – Parsed argparse arguments object with indir and skip_verification arguments.

Returns:

Files with paths located in args.indir.

datadings.tools.print_over(*args, **kwargs)[source]

Wrapper around print that replaces the current line. It prints from the start of the line and clears remaining characters. Accepts the same kwargs as the print function.

Parameters:

flush – If True, flush after printing.

datadings.tools.query_user(question, default='yes', answers=('yes', 'no', 'abort'))[source]

Ask user a question via input() and return their answer.

Adapted from http://code.activestate.com/recipes/577097/

Parameters:
  • question – String that is presented to the user.

  • default – Presumed answer if the user just hits <Enter>. Must be one of prompts or None (meaning an answer is required of the user).

  • answers – Answers the user can give.

Returns:

One of prompts.

datadings.tools.split_array(img, v_pixels, h_pixels, indices=(1, 2))[source]

Split/tile an image/numpy array in horizontal and vertical direction.

Parameters:
  • img – The image to split.

  • h_pixels – Width of each tile in pixels.

  • v_pixels – Height of each tile in pixels.

  • indices – 2-tuple of indices used to calculate number of tiles.

Returns:

Yields single tiles from the image as arrays.

datadings.tools.tiff_to_nd_array(file_path, type=<class 'numpy.uint8'>)[source]

Decode a TIFF image and returns all contained subimages as numpy array. The first dimension of the array indexes the subimages.

Warning

Requires geo (GDAL) extra!

Parameters:
  • file_path – Path to TIFF file.

  • type – Output dtype.

Returns:

TIFF image as numpy array.

datadings.tools.verify_file(meta, indir)[source]
datadings.tools.verify_files(files, indir)[source]

Verify the integrity of the given files.

datadings.tools.yield_process(gen)[source]

Run a generator in a background thread and yield its output in the current thread.

Parameters:

gen – Generator to yield from.

datadings.tools.yield_threaded(gen)[source]

Run a generator in a background thread and yield its output in the current thread.

Parameters:

gen – Generator to yield from.

Submodules