datadings.index package

datadings.index.hash_keys(keys: Sequence[str], max_tries: int = 1000) Tuple[bytes, Sequence[int]][source]

Apply the hash_string() function to the given list of keys, so the returned hashes are 64 bit integers. All hashes are salted and guaranteed collision free. If necessary this method will try different salt values

Parameters:
  • keys – list of keys

  • max_tries – how many different salt values to try to find collision-free hashes

Returns:

used salt and list of key hashes

datadings.index.keys_len(path: Path) int[source]

Read the dataset length from the keys file.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or keys file

Returns:

length of dataset

datadings.index.legacy_index_len(path: Path) int[source]

Read the dataset length from the legacy index file.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or index file

Returns:

length of dataset

datadings.index.legacy_load_index(path: Path) Tuple[Sequence[str], Sequence[int]][source]

Load legacy index as two lists of keys and offsets. Semantics of the returned lists are the same as for load_keys and load_offsets.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – Path to dataset or index file

Returns:

keys and offsets list

datadings.index.load_filter(path: Path) BloomFilter[source]

Load a Bloom filter from file.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or filter file

Returns:

the Bloom filter

datadings.index.load_key_hashes(path: Path) Tuple[bytes, Sequence[int]][source]

Load key hashes from file.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or key hashes file

Returns:

hash salt and list of key hashes

datadings.index.load_keys(path: Path) Sequence[str][source]

Load keys from file.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or keys file

Returns:

list of keys

datadings.index.load_offsets(path: Path) Sequence[int][source]

Load sample offsets from file. First value is always 0 and last is size of data file in bytes, so len(offsets) = len(dataset) + 1.

Correct suffix is appended if path ends with a different suffix.

Parameters:

path – path to data or offsets file

Returns:

sample offsets in data file

datadings.index.write_filter(keys: Sequence[str], path: Path) Path[source]

Create a Bloom filter for the given keys and write result to file.

Correct suffix is appended if path ends with a different suffix.

Parameters:
  • keys – list of keys

  • path – path to data or filter file

Returns:

Path that was written to

datadings.index.write_key_hashes(keys: Sequence[str], path: Path) Path[source]

Hash list of keys and write result to file.

See hash_keys for details on hash method.

Correct suffix is appended if path ends with a different suffix.

Parameters:
  • keys – list of keys

  • path – path to data or offsets file

Returns:

Path that was written to

datadings.index.write_keys(keys: Sequence[str], path: Path) Path[source]

Write list of offsets to file.

Correct suffix is appended if path ends with a different suffix.

Parameters:
  • keys – list of keys

  • path – path to data or keys file

Returns:

Path that was written to

datadings.index.write_offsets(offsets: Sequence[int], path: Path) Path[source]

Write list of offsets to file.

Correct suffix is appended if path ends with a different suffix.

Parameters:
  • offsets – list of offsets

  • path – path to data or offsets file

Returns:

Path that was written to