pyspark.sql.functions.tuple_intersection_agg_double#
- pyspark.sql.functions.tuple_intersection_agg_double(col, mode=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches TupleSketch that is the intersection of the double TupleSketch objects in the input column.
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the intersected TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([(1, 10.0), (2, 20.0), (3, 30.0)], ["key", "value"]) >>> df1 = df1.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch")) >>> df2 = spark.createDataFrame([(2, 40.0), (3, 50.0), (4, 60.0)], ["key", "value"]) >>> df2 = df2.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.tuple_sketch_estimate_double(sf.tuple_intersection_agg_double("sketch"))).show() +------------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch, sum))| +------------------------------------------------------------------------+ | 2.0| +------------------------------------------------------------------------+