pyspark.sql.functions.arrays_zip#

pyspark.sql.functions.arrays_zip(*cols)[source]#

Array function: Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. If one of the arrays is shorter than others then the resulting struct type value will be a null for missing elements.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsColumn or str

Columns of arrays to be merged.

Returns
Column

Merged array of entries.

Examples

Example 1: Zipping two arrays of the same length

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2, 3], ['a', 'b', 'c'])], ['nums', 'letters'])
>>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+-------------------------+
|arrays_zip(nums, letters)|
+-------------------------+
|[{1, a}, {2, b}, {3, c}] |
+-------------------------+

Example 2: Zipping arrays of different lengths

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2], ['a', 'b', 'c'])], ['nums', 'letters'])
>>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+---------------------------+
|arrays_zip(nums, letters)  |
+---------------------------+
|[{1, a}, {2, b}, {NULL, c}]|
+---------------------------+

Example 3: Zipping more than two arrays

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [([1, 2], ['a', 'b'], [True, False])], ['nums', 'letters', 'bools'])
>>> df.select(sf.arrays_zip(df.nums, df.letters, df.bools)).show(truncate=False)
+--------------------------------+
|arrays_zip(nums, letters, bools)|
+--------------------------------+
|[{1, a, true}, {2, b, false}]   |
+--------------------------------+

Example 4: Zipping arrays with null values

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2, None], ['a', None, 'c'])], ['nums', 'letters'])
>>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+------------------------------+
|arrays_zip(nums, letters)     |
+------------------------------+
|[{1, a}, {2, NULL}, {NULL, c}]|
+------------------------------+