execution.batch-shuffle-mode |
ALL_EXCHANGES_BLOCKING |
Enum |
Defines how data is exchanged between tasks in batch 'execution.runtime-mode' if the shuffling behavior has not been set explicitly for an individual exchange. With pipelined exchanges, upstream and downstream tasks run simultaneously. In order to achieve lower latency, a result record is immediately sent to and processed by the downstream task. Thus, the receiver back-pressures the sender. The streaming mode always uses this exchange. With blocking exchanges, upstream and downstream tasks run in stages. Records are persisted to some storage between stages. Downstream tasks then fetch these records after the upstream tasks finished. Such an exchange reduces the resources required to execute the job as it does not need to run upstream and downstream tasks simultaneously. With hybrid exchanges (experimental), downstream tasks can run anytime as long as upstream tasks start running. When given sufficient resources, it can reduce the overall job execution time by running tasks simultaneously. Otherwise, it also allows jobs to be executed with very little resources. It adapts to custom preferences between persisting less data and restarting less tasks on failures, by providing different spilling strategies.
Possible values:- "ALL_EXCHANGES_PIPELINED": Upstream and downstream tasks run simultaneously. This leads to lower latency and more evenly distributed (but higher) resource usage across tasks.
- "ALL_EXCHANGES_BLOCKING": Upstream and downstream tasks run subsequently. This reduces the resource usage as downstream tasks are started after upstream tasks finished.
- "ALL_EXCHANGES_HYBRID_FULL": Downstream can start running anytime, as long as the upstream has started. This adapts the resource usage to whatever is available. This type will spill all data to disk to support re-consume.
- "ALL_EXCHANGES_HYBRID_SELECTIVE": Downstream can start running anytime, as long as the upstream has started. This adapts the resource usage to whatever is available. This type will selective spilling data to reduce disk writes as much as possible.
|
execution.buffer-timeout.enabled |
true |
Boolean |
If disabled, the config execution.buffer-timeout.interval will not take effect and the flushing will be triggered only when the output buffer is full thus maximizing throughput |
execution.buffer-timeout.interval |
100 ms |
Duration |
The maximum time frequency (milliseconds) for the flushing of the output buffers. By default the output buffers flush frequently to provide low latency and to aid smooth developer experience. Setting the parameter can result in three logical modes:- A positive value triggers flushing periodically by that interval
- 0 triggers flushing after every record thus minimizing latency
- If the config execution.buffer-timeout.enabled is false, trigger flushing only when the output buffer is full thus maximizing throughput
|
execution.checkpointing.snapshot-compression |
false |
Boolean |
Tells if we should use compression for the state snapshot data or not |
execution.runtime-mode |
STREAMING |
Enum |
Runtime execution mode of DataStream programs. Among other things, this controls task scheduling, network shuffle behavior, and time semantics.
Possible values:- "STREAMING"
- "BATCH"
- "AUTOMATIC"
|
execution.sort-keyed-partition.memory |
128 mb |
MemorySize |
Sets the managed memory size for sort partition operator on KeyedPartitionWindowedStream.The memory size is only a weight hint. Thus, it will affect the operator's memory weight within a task, but the actual memory used depends on the running environment. |
execution.sort-partition.memory |
128 mb |
MemorySize |
Sets the managed memory size for sort partition operator in NonKeyedPartitionWindowedStream.The memory size is only a weight hint. Thus, it will affect the operator's memory weight within a task, but the actual memory used depends on the running environment. |