pyspark.sql.functions.arrays_zip#
- pyspark.sql.functions.arrays_zip(*cols)[source]#
Array function: Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. If one of the arrays is shorter than others then the resulting struct type value will be a null for missing elements.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Examples
Example 1: Zipping two arrays of the same length
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, 3], ['a', 'b', 'c'])], ['nums', 'letters']) >>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False) +-------------------------+ |arrays_zip(nums, letters)| +-------------------------+ |[{1, a}, {2, b}, {3, c}] | +-------------------------+
Example 2: Zipping arrays of different lengths
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2], ['a', 'b', 'c'])], ['nums', 'letters']) >>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False) +---------------------------+ |arrays_zip(nums, letters) | +---------------------------+ |[{1, a}, {2, b}, {NULL, c}]| +---------------------------+
Example 3: Zipping more than two arrays
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [([1, 2], ['a', 'b'], [True, False])], ['nums', 'letters', 'bools']) >>> df.select(sf.arrays_zip(df.nums, df.letters, df.bools)).show(truncate=False) +--------------------------------+ |arrays_zip(nums, letters, bools)| +--------------------------------+ |[{1, a, true}, {2, b, false}] | +--------------------------------+
Example 4: Zipping arrays with null values
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, None], ['a', None, 'c'])], ['nums', 'letters']) >>> df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False) +------------------------------+ |arrays_zip(nums, letters) | +------------------------------+ |[{1, a}, {2, NULL}, {NULL, c}]| +------------------------------+