Skip to main content

hll_union function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

This function utilizes the HyperLogLog algorithm to combine two sketches into a single sketch.

Queries can use the resulting buffers to compute approximate unique counts as long integers with the hll_sketch_estimate function.

The implementation uses the Apache Datasketches library. Please see HLL for more information.

Syntax

hll_union ( expr1, expr2 [, allowDifferentLgConfigK ] )

Arguments

  • exprN: A BINARY expression holding a sketch generated by hll_sketch_agg.
  • allowDifferentLgConfigK: A optional BOOLEAN expression controlling whether to allow merging two sketches with different lgConfigK values. The default value is false.

Returns

A BINARY buffer containing the HyperLogLog sketch computed as a result of combining the input expressions.

When the allowDifferentLgConfigK parameter is true, the result sketch uses the smaller of the two provided lgConfigK values.

Examples

SQL
> SELECT hll_sketch_estimate(
hll_union(
hll_sketch_agg(col1),
hll_sketch_agg(col2)))
FROM VALUES
(1, 4),
(1, 4),
(2, 5),
(2, 5),
(3, 6) AS tab(col1, col2);
6

> SELECT hll_sketch_estimate(
hll_union(
hll_sketch_agg(col1, 4),
hll_sketch_agg(col2, 21)))
FROM VALUES
(1, 4),
(1, 4),
(2, 5),
(2, 5),
(3, 6) AS tab(col1, col2);
error