pyspark.TaskContext¶
-
class
pyspark.
TaskContext
[source]¶ Contextual information about a task which can be read or mutated during execution. To access the TaskContext for a running task, use:
TaskContext.get()
.New in version 2.2.0.
Examples
>>> from pyspark import TaskContext
Get a task context instance from
RDD
.>>> spark.sparkContext.setLocalProperty("key1", "value") >>> taskcontext = spark.sparkContext.parallelize([1]).map(lambda _: TaskContext.get()).first() >>> isinstance(taskcontext.attemptNumber(), int) True >>> isinstance(taskcontext.partitionId(), int) True >>> isinstance(taskcontext.stageId(), int) True >>> isinstance(taskcontext.taskAttemptId(), int) True >>> taskcontext.getLocalProperty("key1") 'value' >>> isinstance(taskcontext.cpus(), int) True
Get a task context instance from a dataframe via Python UDF.
>>> from pyspark.sql import Row >>> from pyspark.sql.functions import udf >>> @udf("STRUCT<anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(): ... taskcontext = TaskContext.get() ... return Row( ... anum=taskcontext.attemptNumber(), ... partid=taskcontext.partitionId(), ... stageid=taskcontext.stageId(), ... taskaid=taskcontext.taskAttemptId(), ... prop=taskcontext.getLocalProperty("key2"), ... cpus=taskcontext.cpus()) ... >>> spark.sparkContext.setLocalProperty("key2", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row()).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True
Get a task context instance from a dataframe via Pandas UDF.
>>> import pandas as pd >>> from pyspark.sql.functions import pandas_udf >>> @pandas_udf("STRUCT<" ... "anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(_): ... taskcontext = TaskContext.get() ... return pd.DataFrame({ ... "anum": [taskcontext.attemptNumber()], ... "partid": [taskcontext.partitionId()], ... "stageid": [taskcontext.stageId()], ... "taskaid": [taskcontext.taskAttemptId()], ... "prop": [taskcontext.getLocalProperty("key3")], ... "cpus": [taskcontext.cpus()] ... }) ... >>> spark.sparkContext.setLocalProperty("key3", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row("id")).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True
Methods
How many times this task has been attempted.
cpus
()CPUs allocated to the task.
get
()Return the currently active
TaskContext
.getLocalProperty
(key)Get a local property set upstream in the driver, or None if it is missing.
The ID of the RDD partition that is computed by this task.
Resources allocated to the task.
stageId
()The ID of the stage that this task belong to.
An ID that is unique to this task attempt (within the same
SparkContext
, no two task attempts will share the same attempt ID).