pyspark.sql.DataFrame.checkpoint#
- DataFrame.checkpoint(eager=True)[source]#
Returns a checkpointed version of this
DataFrame
. Checkpointing can be used to truncate the logical plan of thisDataFrame
, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set withSparkContext.setCheckpointDir()
, or spark.checkpoint.dir configuration.New in version 2.1.0.
Changed in version 4.0.0: Supports Spark Connect.
- Parameters
- eagerbool, optional, default True
Whether to checkpoint this
DataFrame
immediately.
- Returns
DataFrame
Checkpointed DataFrame.
Notes
This API is experimental.
Examples
>>> df = spark.createDataFrame([ ... (14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> df.checkpoint(False) DataFrame[age: bigint, name: string]