python_hll.hllutil module

class python_hll.hllutil.HLLUtil[source]

Bases: object

Static functions for computing constants and parameters used in the HLL algorithm.

LONG_BIT_LENGTH = 64
PW_MASK = [-9223372036854775808, -1, -4, -64, -16384, -1073741824, -4611686018427387904, -4611686018427387904, -4611686018427387904]
REG_WIDTH_INDEX_MULTIPLIER = 31
classmethod alpha_m_squared(m)[source]

Computes the ‘alpha-m-squared’ constant used by the HyperLogLog algorithm.

Parameters:m (int) – this must be a power of two, cannot be less than 16 (2:sup:4), and cannot be greater than 65536 (2:sup:16).
Returns:gamma times registerCount squared where gamma is based on the value of registerCount.
Return type:float
classmethod large_estimator(log2m, register_size_in_bits, estimator)[source]

The “large range correction” formula from the HyperLogLog algorithm, adapted for 64 bit hashes. Only appropriate for estimators whose value exceeds the return of largeEstimatorCutoff().

See Blog post with section on 64 bit hashes and “large range correction” cutoff.

Parameters:
  • log2m (int) – log-base-2 of the number of registers in the HLL. <em>b<em> in the paper.
  • register_size_in_bits (int) – the size of the HLL registers, in bits.
  • estimator (float) – the original estimator (“E” in the paper).
Returns:

a corrected cardinality estimate.

Return type:

float

classmethod large_estimator_cutoff(log2m, register_size_in_bits)[source]

The cutoff for using the “large range correction” formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.

See Blog post with section on 64 bit hashes and “large range correction” cutoff.

Parameters:
  • log2m (int) – log-base-2 of the number of registers in the HLL. <em>b<em> in the paper.
  • register_size_in_bits (int) – the size of the HLL registers, in bits.
Returns:

the cutoff for the large range correction.

Return type:

float

classmethod pw_max_mask(register_size_in_bits)[source]

Computes a mask that prevents overflow of HyperLogLog registers.

Parameters:register_size_in_bits (int) – the size of the HLL registers, in bits.
Returns:mask a long mask to prevent overflow of the registers
Return type:long
classmethod register_bit_size(expected_unique_elements)[source]

Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.

Parameters:expected_unique_elements (long) – an upper bound on the number of unique elements that are expected. This must be greater than zero.
Returns:a register size in bits (i.e. log2(log2(n)))
Return type:int
classmethod small_estimator(m, number_of_zeroes)[source]

The “small range correction” formula from the HyperLogLog algorithm. Only appropriate if both the estimator is smaller than (5/2) * m and there are still registers that have the zero value.

Parameters:
  • m (int) – the number of registers in the HLL. <em>m<em> in the paper.
  • number_of_zeroes (int) – the number of registers with value zero. V in the paper.
Returns:

a corrected cardinality estimate.

Return type:

float

classmethod small_estimator_cutoff(m)[source]

The cutoff for using the “small range correction” formula, in the HyperLogLog algorithm.

Parameters:m (int) – the number of registers in the HLL. <em>m<em> in the paper.
Returns:the cutoff for the small range correction.
Return type:float