pyspark.sql.functions.hash#

pyspark.sql.functions.hash(*cols)[source]#

Calculates the hash code of given columns, and returns the result as an int column.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colsColumn or column name: one or more columns to compute on.

Returns

Column: hash value as int column.

See also

pyspark.sql.functions.xxhash64()

Examples

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
>>> df.select('*', sf.hash('c1')).show()
+---+---+----------+
| c1| c2|  hash(c1)|
+---+---+----------+
|ABC|DEF|-757602832|
+---+---+----------+

>>> df.select('*', sf.hash('c1', df.c2)).show()
+---+---+------------+
| c1| c2|hash(c1, c2)|
+---+---+------------+
|ABC|DEF|   599895104|
+---+---+------------+

>>> df.select('*', sf.hash('*')).show()
+---+---+------------+
| c1| c2|hash(c1, c2)|
+---+---+------------+
|ABC|DEF|   599895104|
+---+---+------------+