pyspark.sql.functions.hash#
- pyspark.sql.functions.hash(*cols)[source]#
Calculates the hash code of given columns, and returns the result as an int column.
New in version 2.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- cols
Column
or column name one or more columns to compute on.
- cols
- Returns
Column
hash value as int column.
See also
Examples
>>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2']) >>> df.select('*', sf.hash('c1')).show() +---+---+----------+ | c1| c2| hash(c1)| +---+---+----------+ |ABC|DEF|-757602832| +---+---+----------+
>>> df.select('*', sf.hash('c1', df.c2)).show() +---+---+------------+ | c1| c2|hash(c1, c2)| +---+---+------------+ |ABC|DEF| 599895104| +---+---+------------+
>>> df.select('*', sf.hash('*')).show() +---+---+------------+ | c1| c2|hash(c1, c2)| +---+---+------------+ |ABC|DEF| 599895104| +---+---+------------+