pyspark.sql.functions.mode#

pyspark.sql.functions.mode(col, deterministic=False)[source]#

Returns the most frequent value in a group.

New in version 3.4.0.

Changed in version 4.0.0: Supports deterministic argument.

Parameters
colColumn or str

target column to compute on.

deterministicbool, optional

if there are multiple equally-frequent results then return the lowest (defaults to false).

Returns
Column

the most frequent value in a group.

Notes

Supports Spark Connect.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
...     schema=("course", "year", "earnings"))
>>> df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
|  Java|      2012|
|dotNET|      2012|
+------+----------+

When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
>>> df.select(sf.mode("col", False)).show() 
+---------+
|mode(col)|
+---------+
|        0|
+---------+
>>> df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
|                                    -10|
+---------------------------------------+