pyspark.sql.DataFrame.sort#

DataFrame.sort(*cols, **kwargs)[source]#

Returns a new DataFrame sorted by the specified column(s).

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsint, str, list, or Column, optional

list of Column or column names or column ordinals to sort by.

Changed in version 4.0.0: Supports column ordinal.

Returns
DataFrame

Sorted DataFrame.

Other Parameters
ascendingbool or list, optional, default True

boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Notes

A column ordinal starts from 1, which is different from the 0-based __getitem__(). If a column ordinal is negative, it means sort descending.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([
...     (2, "Alice"), (5, "Bob")], schema=["age", "name"])

Sort the DataFrame in ascending order.

>>> df.sort(sf.asc("age")).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+
>>> df.sort(1).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+

Sort the DataFrame in descending order.

>>> df.sort(df.age.desc()).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+
>>> df.orderBy(df.age.desc()).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+
>>> df.sort("age", ascending=False).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+
>>> df.sort(-1).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+

Specify multiple columns

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([
...     (2, "Alice"), (2, "Bob"), (5, "Bob")], schema=["age", "name"])
>>> df.orderBy(sf.desc("age"), "name").show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
|  2|  Bob|
+---+-----+
>>> df.orderBy(-1, "name").show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
|  2|  Bob|
+---+-----+
>>> df.orderBy(-1, 2).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
|  2|  Bob|
+---+-----+

Specify multiple columns for sorting order at ascending.

>>> df.orderBy(["age", "name"], ascending=[False, False]).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|  Bob|
|  2|Alice|
+---+-----+
>>> df.orderBy([1, "name"], ascending=[False, False]).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|  Bob|
|  2|Alice|
+---+-----+
>>> df.orderBy([1, 2], ascending=[False, False]).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|  Bob|
|  2|Alice|
+---+-----+