pyspark.sql.DataFrame.foreachPartition#

DataFrame.foreachPartition(f)[source]#

Applies the f function to each partition of this DataFrame.

This a shorthand for df.rdd.foreachPartition().

New in version 1.3.0.

Changed in version 4.0.0: Supports Spark Connect.

Parameters
ffunction

A function that accepts one parameter which will receive each partition to process.

Examples

>>> df = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
>>> def func(itr):
...     for person in itr:
...         print(person.name)
...
>>> df.foreachPartition(func)