pyspark.sql.functions.tuple_intersection_agg_double#

pyspark.sql.functions.tuple_intersection_agg_double(col, mode=None)[source]#

Aggregate function: returns the compact binary representation of the Datasketches TupleSketch that is the intersection of the double TupleSketch objects in the input column.

New in version 4.2.0.

Parameters

colColumn or column name: The column containing binary TupleSketch representations
modeColumn or str, optional: The summary mode: “sum” (default), “min”, “max”, or “alwaysone”

Returns

Column: The binary representation of the intersected TupleSketch.

See also

pyspark.sql.functions.tuple_sketch_agg_double()
pyspark.sql.functions.tuple_intersection_double()

Examples

>>> from pyspark.sql import functions as sf
>>> df1 = spark.createDataFrame([(1, 10.0), (2, 20.0), (3, 30.0)], ["key", "value"])
>>> df1 = df1.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch"))
>>> df2 = spark.createDataFrame([(2, 40.0), (3, 50.0), (4, 60.0)], ["key", "value"])
>>> df2 = df2.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch"))
>>> df3 = df1.union(df2)
>>> df3.agg(sf.tuple_sketch_estimate_double(sf.tuple_intersection_agg_double("sketch"))).show()
+------------------------------------------------------------------------+
|tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch, sum))|
+------------------------------------------------------------------------+
|                                                                     2.0|
+------------------------------------------------------------------------+