datadings.index package

datadings.index.hash_keys(keys: Sequence[str], max_tries: int = 1000)Tuple[bytes, Sequence[int]][source]

Apply the hash_string() function to the given list of keys, so the returned hashes are 64 bit integers. All hashes are salted and guaranteed collision free. If necessary this method will try different salt values

Parameters
  • keys – list of keys

  • max_tries – how many different salt values to try to find collision-free hashes

Returns

used salt and list of key hashes

datadings.index.keys_len(path: pathlib.Path)int[source]

Read the dataset length from the keys file.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or keys file

Returns

length of dataset

datadings.index.legacy_index_len(path: pathlib.Path)int[source]

Read the dataset length from the legacy index file.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or index file

Returns

length of dataset

datadings.index.legacy_load_index(path: pathlib.Path)Tuple[Sequence[str], Sequence[int]][source]

Load legacy index as two lists of keys and offsets. Semantics of the returned lists are the same as for load_keys and load_offsets.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – Path to dataset or index file

Returns

keys and offsets list

datadings.index.load_filter(path: pathlib.Path)simplebloom.bloom.BloomFilter[source]

Load a Bloom filter from file.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or filter file

Returns

the Bloom filter

datadings.index.load_key_hashes(path: pathlib.Path)Tuple[bytes, Sequence[int]][source]

Load key hashes from file.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or key hashes file

Returns

hash salt and list of key hashes

datadings.index.load_keys(path: pathlib.Path)Sequence[str][source]

Load keys from file.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or keys file

Returns

list of keys

datadings.index.load_offsets(path: pathlib.Path)Sequence[int][source]

Load sample offsets from file. First value is always 0 and last is size of data file in bytes, so len(offsets) = len(dataset) + 1.

Correct suffix is appended if path ends with a different suffix.

Parameters

path – path to data or offsets file

Returns

sample offsets in data file

datadings.index.write_filter(keys: Sequence[str], path: pathlib.Path)pathlib.Path[source]

Create a Bloom filter for the given keys and write result to file.

Correct suffix is appended if path ends with a different suffix.

Parameters
  • keys – list of keys

  • path – path to data or filter file

Returns

Path that was written to

datadings.index.write_key_hashes(keys: Sequence[str], path: pathlib.Path)pathlib.Path[source]

Hash list of keys and write result to file.

See hash_keys for details on hash method.

Correct suffix is appended if path ends with a different suffix.

Parameters
  • keys – list of keys

  • path – path to data or offsets file

Returns

Path that was written to

datadings.index.write_keys(keys: Sequence[str], path: pathlib.Path)pathlib.Path[source]

Write list of offsets to file.

Correct suffix is appended if path ends with a different suffix.

Parameters
  • keys – list of keys

  • path – path to data or keys file

Returns

Path that was written to

datadings.index.write_offsets(offsets: Sequence[int], path: pathlib.Path)pathlib.Path[source]

Write list of offsets to file.

Correct suffix is appended if path ends with a different suffix.

Parameters
  • offsets – list of offsets

  • path – path to data or offsets file

Returns

Path that was written to