Your Flink job has data skew when a subset of subtasks in any of the operators receives a disproportionate number of records, potentially overloading a subset of task managers while the rest remain idle, leading to inefficient processing and potentially backpressure and other related problems.
Data skew is calculated using the Coefficient of Variation (CV) statistic. The data skew percentage shown here is calculated by using the numRecordsIn metric across the subtasks of your operators. The data skew percentage under the Overview tab shows a live data skew percentage by using the numRecordsInPerSecond metric and therefore can be different from the skew percentage shown here.