pyspark.RDD.cleanShuffleDependencies#

RDD.cleanShuffleDependencies(blocking=False)[source]#

Removes an RDD’s shuffles and it’s non-persisted ancestors.

When running without a shuffle service, cleaning up shuffle files enables downscaling. If you use the RDD after this call, you should checkpoint and materialize it first.

New in version 3.3.0.

Parameters
blockingbool, optional, default False

whether to block on shuffle cleanup tasks

Notes

This API is a developer API.