python_hll.serialization module¶
-
class
python_hll.serialization.
BigEndianAscendingWordDeserializer
(word_length, byte_padding, bytes)[source]¶ Bases:
object
A corresponding deserializer for BigEndianAscendingWordSerializer.
-
BITS_PER_BYTE
= 8¶
-
BYTE_MASK
= 255¶
-
read_word
()[source]¶ Return the next word in the sequence. Should not be called more than
total_word_count
times.Return type: long
-
total_word_count
()[source]¶ Returns the number of words that could be encoded in the sequence.
- NOTE: the sequence that was encoded may be shorter than the value this
- method returns due to padding issues within bytes. This guarantees
only an upper bound on the number of times
readWord()
can be called.
Returns: the maximum number of words that could be read from the sequence. Return type: int
-
-
class
python_hll.serialization.
BigEndianAscendingWordSerializer
(word_length, word_count, byte_padding)[source]¶ Bases:
object
A serializer that writes a sequence of fixed bit-width ‘words’ to a byte array. Bitwise OR is used to write words into bytes, so a low bit in a word is also a low bit in a byte. However, a high byte in a word is written at a lower index in the array than a low byte in a word. The first word is written at the lowest array index. Each serializer is one time use and returns its backing byte array.
This encoding was chosen so that when reading bytes as octets in the typical first-octet-is-the-high-nibble fashion, an octet-to-binary conversion would yield a high-to-low, left-to-right view of the “short words”.
Example:
Say short words are 5 bits wide. Our word sequence is the values
[31, 1, 5]
. In big-endian binary format, the values are[0b11111, 0b00001, 0b00101]
. We use 15 of 16 bits in two bytes and pad the last (lowest) bit of the last byte with a zero:[0b11111000, 0b01001010] = [0xF8, 0x4A]
-
BITS_PER_BYTE
= 8¶
-
-
class
python_hll.serialization.
HLLMetadata
(schema_version, type, register_count_log2, register_width, log2_explicit_cutoff, explicit_off, explicit_auto, sparse_enabled)[source]¶ Bases:
object
The metadata and parameters associated with a HLL.
-
explicit_auto
()[source]¶ Returns: True if the HLLType.EXPLICIT
representation cutoff cardinality is set to be automatically chosen, False otherwise.Return type: boolean
-
explicit_off
()[source]¶ Returns: True if the HLLType.EXPLICIT
representation has been disabled. False< otherwise.Return type: boolean
-
log2_explicit_cutoff
()[source]¶ Returns: the log-base-2 of the explicit cutoff cardinality. This will always be greater than or equal to zero and less than 31, per the specification. Return type: int
-
register_count_log2
()[source]¶ Returns: the log-base-2 of the register count parameter of the HLL. This will always be greater than or equal to 4 and less than or equal to 31. Return type: int
-
register_width
()[source]¶ Returns: the register width parameter of the HLL. This will always be greater than or equal to 1 and less than or equal to 8. Return type: int
-
-
class
python_hll.serialization.
SchemaVersionOne
[source]¶ Bases:
object
A serialization schema for HLLs. Reads and writes HLL metadata to and from byte representations.
-
EXPLICIT_AUTO
= 63¶
-
EXPLICIT_OFF
= 0¶
-
HEADER_BYTE_COUNT
= 3¶
-
SCHEMA_VERSION
= 1¶
-
TYPE_ORDINALS
= [5, 1, 2, 3, 4]¶
-
get_deserializer
(type, word_length, bytes)[source]¶ Builds an HLL deserializer that matches this schema version.
Parameters: - type (HLLType) – the HLL type that will be deserialized. This cannot be
None
. - word_length (int) – the length of the ‘words’ that comprise the data of the serialized HLL. Words must be at least 5 bits and at most 64 bits long.
- bytes (list) – the serialized HLL to deserialize. This cannot be
None
.
Returns: a byte array deserializer used to deserialize a HLL serialized according to this schema version’s specification.
Return type: - type (HLLType) – the HLL type that will be deserialized. This cannot be
-
get_serializer
(type, word_length, word_count)[source]¶ Builds an HLL serializer that matches this schema version.
Parameters: - type (HLLType) – the HLL type that will be serialized. This cannot be
None
. - word_length (int) – the length of the ‘words’ that comprise the data of the HLL. Words must be at least 5 bits and at most 64 bits long.
- word_count (int) – the number of ‘words’ in the HLL’s data.
Returns: a byte array serializer used to serialize a HLL according to this schema version’s specification.
Return type: - type (HLLType) – the HLL type that will be serialized. This cannot be
-
padding_bytes
(type)[source]¶ The number of metadata bytes required for a serialized HLL of the specified type.
Parameters: type (HLLType) – the type of the serialized HLL Returns: the number of padding bytes needed in order to fully accommodate the needed metadata. Return type: int
-
read_metadata
(bytes)[source]¶ Reads the metadata bytes of the serialized HLL.
Parameters: bytes (list) – the serialized HLL Returns: the HLL metadata Return type: HLLMetadata
-
write_metadata
(bytes, metadata)[source]¶ Writes metadata bytes to serialized HLL.
Parameters: - bytes (list) – the padded data bytes of the HLL
- metadata (HLLMetadata) – the metadata to write to the padding bytes
Return type: void
-
-
class
python_hll.serialization.
SerializationUtil
[source]¶ Bases:
object
A collection of constants and utilities for serializing and deserializing HLLs.
-
DEFAULT_SCHEMA_VERSION
= <python_hll.serialization.SchemaVersionOne object>¶
-
EXPLICIT_CUTOFF_BITS
= 6¶
-
EXPLICIT_CUTOFF_MASK
= 63¶
-
LOG2_REGISTER_COUNT_BITS
= 5¶
-
LOG2_REGISTER_COUNT_MASK
= 31¶
-
NIBBLE_BITS
= 4¶
-
NIBBLE_MASK
= 15¶
-
REGISTERED_SCHEMA_VERSIONS
= [None, <python_hll.serialization.SchemaVersionOne object>]¶
-
REGISTER_WIDTH_BITS
= 3¶
-
REGISTER_WIDTH_MASK
= 7¶
-
VERSION_ONE
= <python_hll.serialization.SchemaVersionOne object>¶
-
classmethod
explicit_cutoff
(cutoff_byte)[source]¶ Extracts the explicit cutoff value from the cutoff byte of a serialized HLL.
Parameters: cutoff_byte (byte) – the cutoff byte of the serialized HLL Returns: the explicit cutoff value Return type: int
-
classmethod
get_schema_version
(bytes)[source]¶ Get the appropriate
SchemaVersion
for the specified serialized HLL.Parameters: bytes (list) – the serialized HLL whose schema version is desired. :returns the schema version for the specified HLL. This will never be
None
. :rtype: SchemaVersion
-
classmethod
get_schema_version_from_number
(schema_version_number)[source]¶ Parameters: schema_version_number (int) – the version number of the SchemaVersion
desired. This must be a registered schema version number.Returns: The SchemaVersion
for the given number. This will never beNone
.Return type: SchemaVersion
-
classmethod
pack_cutoff_byte
(explicit_cutoff, sparse_enabled)[source]¶ Generates a byte that encodes the log-base-2 of the explicit cutoff or sentinel values for ‘explicit-disabled’ or ‘auto’, as well as the boolean indicating whether to use
HLLType.SPARSE
in the promotion hierarchy.The top bit is always padding, the second highest bit indicates the ‘sparse-enabled’ boolean, and the lowest six bits encode the explicit cutoff value.
Parameters: - explicit_cutoff (int) – the explicit cutoff value to encode.
* If ‘explicit-disabled’ is chosen, this value should be
0
. * If a cutoff of 2:sup:n is desired, for``0 <= n < 31``, this value should ben + 1
. - sparse_enabled (boolean) – whether
HLLType.SPARSE
should be used in the promotion hierarchy to improve HLL storage.
Return type: byte
- explicit_cutoff (int) – the explicit cutoff value to encode.
* If ‘explicit-disabled’ is chosen, this value should be
-
classmethod
pack_parameters_byte
(register_width, register_count_log2)[source]¶ Generates a byte that encodes the parameters of a
HLLType.FULL
orHLLType.SPARSE
HLL.The top 3 bits are used to encode
registerWidth - 1
(range ofregisterWidth
is thus 1-9) and the bottom 5 bits are used to encoderegisterCountLog2
(range ofregisterCountLog2
is thus 0-31).Parameters: - register_width (int) – the register width (must be at least 1 and at most 9)
- register_count_log2 (int) – the log-base-2 of the register count (must be at least 0 and at most 31)
Returns: the packed parameters byte
Return type: byte
-
classmethod
pack_version_byte
(schema_version, type_ordinal)[source]¶ Generates a byte that encodes the schema version and the type ordinal of the HLL.
The top nibble is the schema version and the bottom nibble is the type ordinal.
Parameters: - schema_version (int) – the schema version to encode.
- type_ordinal (int) – the type ordinal of the HLL to encode.
Returns: the packed version byte
Return type: byte
-
classmethod
register_count_log2
(parameters_byte)[source]¶ Extracts the log2(register_count) from the parameters byte of a serialized
HLLType.FULL
HLL.Parameters: parameters_byte (byte) – the parameters byte of the serialized HLL Returns: log2(registerCount) of the serialized HLL Return type: int
-
classmethod
register_width
(parameters_byte)[source]¶ Extracts the register width from the parameters byte of a serialized
HLLType.FULL
HLL.Parameters: parameters_byte (byte) – the parameters byte of the serialized HLL Returns: the register width of the serialized HLL Return type: int
-
classmethod
schema_version
(version_byte)[source]¶ Extracts the schema version from the version byte of a serialized HLL.
Parameters: version_byte (byte) – the version byte of the serialized HLL Returns: the schema version of the serialized HLL Return type: int
-