datadings.tools package
- class datadings.tools.Yielder(gen, queue, end, error)[source]
Bases:
Thread
- run()[source]
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- datadings.tools.document_keys(typefun, block='Important:', prefix='Samples have the following keys:', postfix='')[source]
Extract the keys that samples created by a type function have create a documentation string that lists them. For example, it produces the following documentation for
ImageClassificationData
:{block} {prefix} - ``"key"`` - ``"image"`` - ``"label"`` {postfix}
- Parameters:
typefun – Type function to analyze.
block – Type of block to use. Defaults to “Important:”.
prefix – Text before parameter list.
postfix – Text after parameter list.
- datadings.tools.download_files_if_not_found(files, indir)[source]
Run :py:func:
download_if_not_found
for multiple files.See also
- datadings.tools.download_if_not_found(url, path)[source]
Check if
path
is a file, otherwise download fromurl
topath
.
- datadings.tools.hash_md5hex(path, read_size=65536, progress=False)[source]
Calculate the (hexadecimal) MD5 hash of a file.
- Parameters:
path – File to hash.
read_size – Read-ahead size.
progress – If True, display progress.
- Returns:
Hexadecimal MD5 hash as string.
- datadings.tools.hash_string(s: str, salt: bytes = b'', __struct=<_struct.Struct object>) int [source]
Hash a string using the blake2s algorithm.
- Parameters:
s – the string
salt – optional salt, max 8 bytes
- Returns:
first 8 bytes of the hash, interpreted as big-endian uint64
- datadings.tools.hash_string_bytes(s: str, salt: bytes = b'', __struct=<_struct.Struct object>) bytes [source]
Hash a string using the blake2s algorithm.
- Parameters:
s – the string
salt – optional salt, max 8 bytes
- Returns:
first 8 bytes of the hash
- datadings.tools.load_md5file(path)[source]
Load a text files of MD5 hashes.
- Parameters:
path – Path to MD5 file.
- Returns:
Dict of (file, hash) pairs.
- datadings.tools.locate_files(files, indir)[source]
Returns a copy of
files
where paths are replaced with concrete paths located inindir
.See also
- datadings.tools.make_printer(bar_format='{desc} {percentage:3.0f}% {elapsed}<{remaining}, {rate_fmt}{postfix}', miniters=0, mininterval=0.5, smoothing=0.1, **kwargs)[source]
Convenience function to create tqdm objects with some default arguments.
- Returns:
tqdm.tqdm object.
- datadings.tools.path_append(path: Path, string: str)[source]
Append a string to the name of a pathlib Path.
- Parameters:
path – the path
string – the bit to append
- Returns:
Path with stuff appended
- Raises:
e.g., root /. –
- datadings.tools.path_append_suffix(path: Path, suffix: str)[source]
Appends the given suffix to the path if the path does not end with said suffix:
>>> path_append_suffix(Path('some.file'), '.file') >>> Path('some.file') >>> path_append_suffix(Path('some.file'), '.txt') >>> Path('some.file.txt')
Behaves like
path_append
if suffix does not startwith'.'
(dot):>>> path_append_suffix(Path('some.file'), 'txt') >>> Path('some.filetxt')
- Parameters:
path – the base path
suffix – suffix to append if necessary
- Returns:
Path that ends with suffix.
- datadings.tools.prepare_indir(files, args)[source]
Prepare a directory for dataset creation.
files
specifies with files need be downloaded and/or integrity checked. It is a dict of file descriptions like these:files = { 'train': { 'path': 'dataset.zip', 'url': 'http://cool.dataset/dataset.zip', 'md5': '56ad5c77e6c8f72ed9ef2901628d6e48', } }
Once downloads and/or verification have finished, the relative paths are replaced with concrete paths in
args.indir
.- Parameters:
files – Dict of file descriptions.
args – Parsed argparse arguments object with
indir
andskip_verification
arguments.
- Returns:
Files with paths located in args.indir.
- datadings.tools.print_over(*args, **kwargs)[source]
Wrapper around print that replaces the current line. It prints from the start of the line and clears remaining characters. Accepts the same kwargs as the print function.
- Parameters:
flush – If True, flush after printing.
- datadings.tools.query_user(question, default='yes', answers=('yes', 'no', 'abort'))[source]
Ask user a question via input() and return their answer.
Adapted from http://code.activestate.com/recipes/577097/
- Parameters:
question – String that is presented to the user.
default – Presumed answer if the user just hits <Enter>. Must be one of
prompts
orNone
(meaning an answer is required of the user).answers – Answers the user can give.
- Returns:
One of
prompts
.
- datadings.tools.split_array(img, v_pixels, h_pixels, indices=(1, 2))[source]
Split/tile an image/numpy array in horizontal and vertical direction.
- Parameters:
img – The image to split.
h_pixels – Width of each tile in pixels.
v_pixels – Height of each tile in pixels.
indices – 2-tuple of indices used to calculate number of tiles.
- Returns:
Yields single tiles from the image as arrays.
- datadings.tools.tiff_to_nd_array(file_path, type=<class 'numpy.uint8'>)[source]
Decode a TIFF image and returns all contained subimages as numpy array. The first dimension of the array indexes the subimages.
Warning
Requires geo (GDAL) extra!
- Parameters:
file_path – Path to TIFF file.
type – Output dtype.
- Returns:
TIFF image as numpy array.
- datadings.tools.verify_files(files, indir)[source]
Verify the integrity of the given files.
See also
- datadings.tools.yield_process(gen)[source]
Run a generator in a background thread and yield its output in the current thread.
- Parameters:
gen – Generator to yield from.
- datadings.tools.yield_threaded(gen)[source]
Run a generator in a background thread and yield its output in the current thread.
- Parameters:
gen – Generator to yield from.
Submodules
- datadings.tools.argparse module
- datadings.tools.cached_property module
- datadings.tools.compression module
- datadings.tools.matlab module
- datadings.tools.msgpack module