pyspark.sql.functions.hll_union¶
-
pyspark.sql.functions.
hll_union
(col1: ColumnOrName, col2: ColumnOrName, allowDifferentLgConfigK: Optional[bool] = None) → pyspark.sql.column.Column[source]¶ Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.
New in version 3.5.0.
- Parameters
- Returns
Column
The binary representation of the merged HllSketch.
Examples
>>> df = spark.createDataFrame([(1,4),(2,5),(2,5),(3,6)], "struct<v1:int,v2:int>") >>> df = df.agg(hll_sketch_agg("v1").alias("sketch1"), hll_sketch_agg("v2").alias("sketch2")) >>> df = df.withColumn("distinct_cnt", hll_sketch_estimate(hll_union("sketch1", "sketch2"))) >>> df.drop("sketch1", "sketch2").show() +------------+ |distinct_cnt| +------------+ | 6| +------------+