execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task |
16 mb |
MemorySize |
The average size of data volume to expect each task instance to process if jobmanager.scheduler has been set to AdaptiveBatch . Note that when data skew occurs or the decided parallelism reaches the execution.batch.adaptive.auto-parallelism.max-parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. |
execution.batch.adaptive.auto-parallelism.default-source-parallelism |
(none) |
Integer |
The default parallelism of source vertices or the upper bound of source parallelism to set adaptively if jobmanager.scheduler has been set to AdaptiveBatch . Note that execution.batch.adaptive.auto-parallelism.max-parallelism will be used if this configuration is not configured. If execution.batch.adaptive.auto-parallelism.max-parallelism is not set either, then the default parallelism set via parallelism.default will be used instead. |
execution.batch.adaptive.auto-parallelism.enabled |
true |
Boolean |
If true, Flink will automatically decide the parallelism of operators in batch jobs. |
execution.batch.adaptive.auto-parallelism.max-parallelism |
128 |
Integer |
The upper bound of allowed parallelism to set adaptively if jobmanager.scheduler has been set to AdaptiveBatch |
execution.batch.adaptive.auto-parallelism.min-parallelism |
1 |
Integer |
The lower bound of allowed parallelism to set adaptively if jobmanager.scheduler has been set to AdaptiveBatch |
execution.batch.job-recovery.enabled |
false |
Boolean |
A flag to enable or disable the job recovery. If enabled, batch jobs can resume with previously generated intermediate results after job master restarts due to failures, thereby preserving the progress. |
execution.batch.job-recovery.previous-worker.recovery.timeout |
30 s |
Duration |
The timeout for a new job master to wait for the previous worker to reconnect.A reconnected worker will transmit the details of its produced intermediate results to the new job master, enabling the job master to reuse these results. |
execution.batch.job-recovery.snapshot.min-pause |
3 min |
Duration |
The minimal pause between snapshots taken by operator coordinator or other components. It is used to avoid performance degradation due to excessive snapshot frequency. |
execution.batch.speculative.block-slow-node-duration |
1 min |
Duration |
Controls how long an detected slow node should be blocked for. |
execution.batch.speculative.enabled |
false |
Boolean |
Controls whether to enable speculative execution. |
execution.batch.speculative.max-concurrent-executions |
2 |
Integer |
Controls the maximum number of execution attempts of each operator that can execute concurrently, including the original one and speculative ones. |
job-event.store.write-buffer.flush-interval |
1 s |
Duration |
The flush interval of JobEventStore write buffers. Buffer contents will be flushed to external file system regularly with regard to this value. |
job-event.store.write-buffer.size |
1 mb |
MemorySize |
The size of the write buffer of JobEventStore. The content will be flushed to external file system once the buffer is full |
jobmanager.adaptive-scheduler.min-parallelism-increase |
1 |
Integer |
Configure the minimum increase in parallelism for a job to scale up. |
jobmanager.adaptive-scheduler.resource-stabilization-timeout |
10 s |
Duration |
The resource stabilization timeout defines the time the JobManager will wait if fewer than the desired but sufficient resources are available. The timeout starts once sufficient resources for running the job are available. Once this timeout has passed, the job will start executing with the available resources. If scheduler-mode is configured to REACTIVE , this configuration value will default to 0, so that jobs are starting immediately with the available resources. |
jobmanager.adaptive-scheduler.resource-wait-timeout |
5 min |
Duration |
The maximum time the JobManager will wait to acquire all required resources after a job submission or restart. Once elapsed it will try to run the job with a lower parallelism, or fail if the minimum amount of resources could not be acquired. Increasing this value will make the cluster more resilient against temporary resources shortages (e.g., there is more time for a failed TaskManager to be restarted). Setting a negative duration will disable the resource timeout: The JobManager will wait indefinitely for resources to appear. If scheduler-mode is configured to REACTIVE , this configuration value will default to a negative value to disable the resource timeout. |
jobmanager.adaptive-scheduler.scaling-interval.max |
(none) |
Duration |
Determines the maximum interval time after which a scaling operation is forced even if the jobmanager.adaptive-scheduler.min-parallelism-increase aren't met. The scaling operation will be ignored when the resource hasn't changed. This option is disabled by default. |
jobmanager.adaptive-scheduler.scaling-interval.min |
30 s |
Duration |
Determines the minimum time between scaling operations. |
jobmanager.partition.hybrid.partition-data-consume-constraint |
(none) |
Enum |
Controls the constraint that hybrid partition data can be consumed. Note that this option is allowed only when jobmanager.scheduler has been set to AdaptiveBatch . Accepted values are:- '
ALL_PRODUCERS_FINISHED ': hybrid partition data can be consumed only when all producers are finished. - '
ONLY_FINISHED_PRODUCERS ': hybrid partition data can be consumed when its producer is finished. - '
UNFINISHED_PRODUCERS ': hybrid partition data can be consumed even if its producer is un-finished.
Possible values:- "ALL_PRODUCERS_FINISHED"
- "ONLY_FINISHED_PRODUCERS"
- "UNFINISHED_PRODUCERS"
|
jobmanager.scheduler |
Default |
Enum |
Determines which scheduler implementation is used to schedule tasks. If this option is not explicitly set, batch jobs will use the 'AdaptiveBatch' scheduler as the default, while streaming jobs will default to the 'Default' scheduler.
Possible values:- "Default": Default scheduler
- "Adaptive": Adaptive scheduler. More details can be found here.
- "AdaptiveBatch": Adaptive batch scheduler. More details can be found here.
|
scheduler-mode |
(none) |
Enum |
Determines the mode of the scheduler. Note that scheduler-mode =REACTIVE is only supported by standalone application deployments, not by active resource managers (YARN, Kubernetes) or session clusters.
Possible values: |
slot.idle.timeout |
50 s |
Duration |
The timeout for a idle slot in Slot Pool. |
slot.request.timeout |
5 min |
Duration |
The timeout for requesting a slot from Slot Pool. |
slotmanager.max-total-resource.cpu |
(none) |
Double |
Maximum cpu cores the Flink cluster allocates for slots. Resources for JobManager and TaskManager framework are excluded. If not configured, it will be derived from 'slotmanager.number-of-slots.max'. |
slotmanager.max-total-resource.memory |
(none) |
MemorySize |
Maximum memory size the Flink cluster allocates for slots. Resources for JobManager and TaskManager framework are excluded. If not configured, it will be derived from 'slotmanager.number-of-slots.max'. |
slotmanager.min-total-resource.cpu |
(none) |
Double |
Minimum cpu cores the Flink cluster allocates for slots. Resources for JobManager and TaskManager framework are excluded. If not configured, it will be derived from 'slotmanager.number-of-slots.min'. |
slotmanager.min-total-resource.memory |
(none) |
MemorySize |
Minimum memory size the Flink cluster allocates for slots. Resources for JobManager and TaskManager framework are excluded. If not configured, it will be derived from 'slotmanager.number-of-slots.min'. |
slotmanager.number-of-slots.max |
infinite |
Integer |
Defines the maximum number of slots that the Flink cluster allocates. This configuration option is meant for limiting the resource consumption for batch workloads. It is not recommended to configure this option for streaming workloads, which may fail if there are not enough slots. Note that this configuration option does not take effect for standalone clusters, where how many slots are allocated is not controlled by Flink. |
slotmanager.number-of-slots.min |
0 |
Integer |
Defines the minimum number of slots that the Flink cluster allocates. This configuration option is meant for cluster to initialize certain workers in best efforts when starting. This can be used to speed up a job startup process. Note that this configuration option does not take effect for standalone clusters, where how many slots are allocated is not controlled by Flink. |
slow-task-detector.check-interval |
1 s |
Duration |
The interval to check slow tasks. |
slow-task-detector.execution-time.baseline-lower-bound |
1 min |
Duration |
The lower bound of slow task detection baseline. |
slow-task-detector.execution-time.baseline-multiplier |
1.5 |
Double |
The multiplier to calculate the slow tasks detection baseline. Given that the parallelism is N and the ratio is R, define T as the median of the first N*R finished tasks' execution time. The baseline will be T*M, where M is the multiplier of the baseline. Note that the execution time will be weighted with the task's input bytes to ensure the accuracy of the detection if data skew occurs. |
slow-task-detector.execution-time.baseline-ratio |
0.75 |
Double |
The finished execution ratio threshold to calculate the slow tasks detection baseline. Given that the parallelism is N and the ratio is R, define T as the median of the first N*R finished tasks' execution time. The baseline will be T*M, where M is the multiplier of the baseline. Note that the execution time will be weighted with the task's input bytes to ensure the accuracy of the detection if data skew occurs. |
taskmanager.load-balance.mode |
NONE |
Enum |
Mode for the load-balance allocation strategy across all available TaskManagers .- The
SLOTS mode tries to spread out the slots evenly across all available TaskManagers . - The
NONE mode is the default mode without any specified strategy.
Possible values: |