pyspark.sql.functions.zip_with

pyspark.sql.functions.zip_with(left: ColumnOrName, right: ColumnOrName, f: Callable[[pyspark.sql.column.Column, pyspark.sql.column.Column], pyspark.sql.column.Column]) → pyspark.sql.column.Column[source]

Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.

New in version 3.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
leftColumn or str

name of the first column or expression

rightColumn or str

name of the second column or expression

ffunction

a binary function (x1: Column, x2: Column) -> Column... Can use methods of Column, functions defined in pyspark.sql.functions and Scala UserDefinedFunctions. Python UserDefinedFunctions are not supported (SPARK-27052).

Returns
Column

array of calculated values derived by applying given function to each pair of arguments.

Examples

>>> df = spark.createDataFrame([(1, [1, 3, 5, 8], [0, 2, 4, 6])], ("id", "xs", "ys"))
>>> df.select(zip_with("xs", "ys", lambda x, y: x ** y).alias("powers")).show(truncate=False)
+---------------------------+
|powers                     |
+---------------------------+
|[1.0, 9.0, 625.0, 262144.0]|
+---------------------------+
>>> df = spark.createDataFrame([(1, ["foo", "bar"], [1, 2, 3])], ("id", "xs", "ys"))
>>> df.select(zip_with("xs", "ys", lambda x, y: concat_ws("_", x, y)).alias("xs_ys")).show()
+-----------------+
|            xs_ys|
+-----------------+
|[foo_1, bar_2, 3]|
+-----------------+