pyspark.sql.functions.cume_dist¶
-
pyspark.sql.functions.
cume_dist
() → pyspark.sql.column.Column[source]¶ Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
- Returns
Column
the column for calculating cumulative distribution.
Examples
>>> from pyspark.sql import Window, types >>> df = spark.createDataFrame([1, 2, 3, 3, 4], types.IntegerType()) >>> w = Window.orderBy("value") >>> df.withColumn("cd", cume_dist().over(w)).show() +-----+---+ |value| cd| +-----+---+ | 1|0.2| | 2|0.4| | 3|0.8| | 3|0.8| | 4|1.0| +-----+---+