python_hll.hllutil module¶
-
class
python_hll.hllutil.
HLLUtil
[source]¶ Bases:
object
Static functions for computing constants and parameters used in the HLL algorithm.
-
LONG_BIT_LENGTH
= 64¶
-
PW_MASK
= [-9223372036854775808, -1, -4, -64, -16384, -1073741824, -4611686018427387904, -4611686018427387904, -4611686018427387904]¶
-
REG_WIDTH_INDEX_MULTIPLIER
= 31¶
-
classmethod
alpha_m_squared
(m)[source]¶ Computes the ‘alpha-m-squared’ constant used by the HyperLogLog algorithm.
Parameters: m (int) – this must be a power of two, cannot be less than 16 (2:sup:4), and cannot be greater than 65536 (2:sup:16). Returns: gamma times registerCount
squared where gamma is based on the value ofregisterCount
.Return type: float
-
classmethod
large_estimator
(log2m, register_size_in_bits, estimator)[source]¶ The “large range correction” formula from the HyperLogLog algorithm, adapted for 64 bit hashes. Only appropriate for estimators whose value exceeds the return of
largeEstimatorCutoff()
.See Blog post with section on 64 bit hashes and “large range correction” cutoff.
Parameters: - log2m (int) – log-base-2 of the number of registers in the HLL. <em>b<em> in the paper.
- register_size_in_bits (int) – the size of the HLL registers, in bits.
- estimator (float) – the original estimator (“E” in the paper).
Returns: a corrected cardinality estimate.
Return type: float
-
classmethod
large_estimator_cutoff
(log2m, register_size_in_bits)[source]¶ The cutoff for using the “large range correction” formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.
See Blog post with section on 64 bit hashes and “large range correction” cutoff.
Parameters: - log2m (int) – log-base-2 of the number of registers in the HLL. <em>b<em> in the paper.
- register_size_in_bits (int) – the size of the HLL registers, in bits.
Returns: the cutoff for the large range correction.
Return type: float
-
classmethod
pw_max_mask
(register_size_in_bits)[source]¶ Computes a mask that prevents overflow of HyperLogLog registers.
Parameters: register_size_in_bits (int) – the size of the HLL registers, in bits. Returns: mask a long
mask to prevent overflow of the registersReturn type: long
-
classmethod
register_bit_size
(expected_unique_elements)[source]¶ Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.
Parameters: expected_unique_elements (long) – an upper bound on the number of unique elements that are expected. This must be greater than zero. Returns: a register size in bits (i.e. log2(log2(n))
)Return type: int
-
classmethod
small_estimator
(m, number_of_zeroes)[source]¶ The “small range correction” formula from the HyperLogLog algorithm. Only appropriate if both the estimator is smaller than
(5/2) * m
and there are still registers that have the zero value.Parameters: - m (int) – the number of registers in the HLL. <em>m<em> in the paper.
- number_of_zeroes (int) – the number of registers with value zero.
V
in the paper.
Returns: a corrected cardinality estimate.
Return type: float
-