pyspark.sql.functions.hll_sketch_estimate#

pyspark.sql.functions.hll_sketch_estimate(col)[source]#

Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.

New in version 3.5.0.

Parameters
colColumn or column name
Returns
Column

The estimated number of unique values for the HllSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([1,2,2,3], "INT")
>>> df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value"))).show()
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
|                                             3|
+----------------------------------------------+