datadings.index package¶
- datadings.index.hash_keys(keys: Sequence[str], max_tries: int = 1000) → Tuple[bytes, Sequence[int]][source]¶
Apply the
hash_string()
function to the given list of keys, so the returned hashes are 64 bit integers. All hashes are salted and guaranteed collision free. If necessary this method will try different salt values- Parameters
keys – list of keys
max_tries – how many different salt values to try to find collision-free hashes
- Returns
used salt and list of key hashes
- datadings.index.keys_len(path: pathlib.Path) → int[source]¶
Read the dataset length from the keys file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or keys file
- Returns
length of dataset
- datadings.index.legacy_index_len(path: pathlib.Path) → int[source]¶
Read the dataset length from the legacy index file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or index file
- Returns
length of dataset
- datadings.index.legacy_load_index(path: pathlib.Path) → Tuple[Sequence[str], Sequence[int]][source]¶
Load legacy index as two lists of keys and offsets. Semantics of the returned lists are the same as for
load_keys
andload_offsets
.Correct suffix is appended if path ends with a different suffix.
- Parameters
path – Path to dataset or index file
- Returns
keys and offsets list
- datadings.index.load_filter(path: pathlib.Path) → simplebloom.bloom.BloomFilter[source]¶
Load a Bloom filter from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or filter file
- Returns
the Bloom filter
- datadings.index.load_key_hashes(path: pathlib.Path) → Tuple[bytes, Sequence[int]][source]¶
Load key hashes from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or key hashes file
- Returns
hash salt and list of key hashes
- datadings.index.load_keys(path: pathlib.Path) → Sequence[str][source]¶
Load keys from file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or keys file
- Returns
list of keys
- datadings.index.load_offsets(path: pathlib.Path) → Sequence[int][source]¶
Load sample offsets from file. First value is always 0 and last is size of data file in bytes, so
len(offsets) = len(dataset) + 1
.Correct suffix is appended if path ends with a different suffix.
- Parameters
path – path to data or offsets file
- Returns
sample offsets in data file
- datadings.index.write_filter(keys: Sequence[str], path: pathlib.Path) → pathlib.Path[source]¶
Create a Bloom filter for the given keys and write result to file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
keys – list of keys
path – path to data or filter file
- Returns
Path that was written to
- datadings.index.write_key_hashes(keys: Sequence[str], path: pathlib.Path) → pathlib.Path[source]¶
Hash list of keys and write result to file.
See
hash_keys
for details on hash method.Correct suffix is appended if path ends with a different suffix.
- Parameters
keys – list of keys
path – path to data or offsets file
- Returns
Path that was written to
- datadings.index.write_keys(keys: Sequence[str], path: pathlib.Path) → pathlib.Path[source]¶
Write list of offsets to file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
keys – list of keys
path – path to data or keys file
- Returns
Path that was written to
- datadings.index.write_offsets(offsets: Sequence[int], path: pathlib.Path) → pathlib.Path[source]¶
Write list of offsets to file.
Correct suffix is appended if path ends with a different suffix.
- Parameters
offsets – list of offsets
path – path to data or offsets file
- Returns
Path that was written to