datadings.index package
- datadings.index.hash_keys(keys: Sequence[str], max_tries: int = 1000) Tuple[bytes, Sequence[int]] [source]
Apply the
hash_string()
function to the given list of keys, so the returned hashes are 64 bit integers. All hashes are salted and guaranteed collision free. If necessary this method will try different salt values- Parameters:
keys – list of keys
max_tries – how many different salt values to try to find collision-free hashes
- Returns:
used salt and list of key hashes
- datadings.index.keys_len(path: Path) int [source]
Read the dataset length from the keys file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or keys file
- Returns:
length of dataset
- datadings.index.legacy_index_len(path: Path) int [source]
Read the dataset length from the legacy index file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or index file
- Returns:
length of dataset
- datadings.index.legacy_load_index(path: Path) Tuple[Sequence[str], Sequence[int]] [source]
Load legacy index as two lists of keys and offsets. Semantics of the returned lists are the same as for
load_keys
andload_offsets
.Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – Path to dataset or index file
- Returns:
keys and offsets list
- datadings.index.load_filter(path: Path) BloomFilter [source]
Load a Bloom filter from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or filter file
- Returns:
the Bloom filter
- datadings.index.load_key_hashes(path: Path) Tuple[bytes, Sequence[int]] [source]
Load key hashes from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or key hashes file
- Returns:
hash salt and list of key hashes
- datadings.index.load_keys(path: Path) Sequence[str] [source]
Load keys from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or keys file
- Returns:
list of keys
- datadings.index.load_offsets(path: Path) Sequence[int] [source]
Load sample offsets from file. First value is always 0 and last is size of data file in bytes, so
len(offsets) = len(dataset) + 1
.Correct suffix is appended if path ends with a different suffix.
- Parameters:
path – path to data or offsets file
- Returns:
sample offsets in data file
- datadings.index.write_filter(keys: Sequence[str], path: Path) Path [source]
Create a Bloom filter for the given keys and write result to file.
Correct suffix is appended if path ends with a different suffix.
- Parameters:
keys – list of keys
path – path to data or filter file
- Returns:
Path that was written to
- datadings.index.write_key_hashes(keys: Sequence[str], path: Path) Path [source]
Hash list of keys and write result to file.
See
hash_keys
for details on hash method.Correct suffix is appended if path ends with a different suffix.
- Parameters:
keys – list of keys
path – path to data or offsets file
- Returns:
Path that was written to