Skip to content

Commit 88fa5f1

Browse files
committed
[FLINK-24876][docs] Remove metrics limitation of Adaptive Scheduler
1 parent 942bc87 commit 88fa5f1

File tree

4 files changed

+0
-18
lines changed

4 files changed

+0
-18
lines changed

docs/content.zh/docs/deployment/elastic_scaling.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,6 @@ Adaptive 调度器可以通过[所有在名字包含 `adaptive-scheduler` 的配
146146
- **不支持[本地恢复]({{< ref "docs/ops/state/large_state_tuning">}}#task-local-recovery)**:本地恢复是将 Task 调度到状态尽可能的被重用的机器上的功能。不支持这个功能意味着 Adaptive 调度器需要每次从 Checkpoint 的存储中下载整个 State。
147147
- **不支持部分故障恢复**: 部分故障恢复意味着调度器可以只重启失败 Job 其中某一部分(在 Flink 的内部结构中被称之为 Region)而不是重启整个 Job。这个限制只会影响那些独立并行(Embarrassingly Parallel)Job的恢复时长,默认的调度器可以重启失败的部分,然而 Adaptive 将需要重启整个 Job。
148148
- **与 Flink Web UI 的集成受限**: Adaptive 调度器会在 Job 的生命周期中改变它的并行度。Web UI 上只显示 Job 当前的并行度。
149-
- **Job 的指标受限**: 除了 `numRestarts` 外,`Job` 作用域下所有的 [可用性]({{< ref "docs/ops/metrics" >}}#availability) 和 [Checkpoint]({{< ref "docs/ops/metrics" >}}#checkpointing) 指标都不准确。
150149
- **空闲 Slot**: 如果 Slot 共享组的最大并行度不相等,提供给 Adaptive 调度器所使用的的 Slot 可能不会被使用。
151150
- 扩缩容事件会触发 Job 和 Task 重启,Task 重试的次数也会增加。
152151

docs/content.zh/docs/ops/metrics.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1051,10 +1051,6 @@ Metrics related to data exchange between task executors using netty network comm
10511051

10521052
### Availability
10531053

1054-
{{< hint warning >}}
1055-
If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode) is enabled then these metrics, except `numRestarts`, do not work correctly.
1056-
{{< /hint >}}
1057-
10581054
<table class="table table-bordered">
10591055
<thead>
10601056
<tr>
@@ -1103,10 +1099,6 @@ If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode)
11031099
{
11041100
### Checkpointing
11051101

1106-
{{< hint warning >}}
1107-
If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode) is enabled then the checkpointing metrics with the `Job` scope do not work correctly.
1108-
{{< /hint >}}
1109-
11101102
Note that for failed checkpoints, metrics are updated on a best efforts basis and may be not accurate.
11111103
<table class="table table-bordered">
11121104
<thead>

docs/content/docs/deployment/elastic_scaling.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,6 @@ The behavior of Adaptive Scheduler is configured by [all configuration options c
148148
- **No support for [local recovery]({{< ref "docs/ops/state/large_state_tuning">}}#task-local-recovery)**: Local recovery is a feature that schedules tasks to machines so that the state on that machine gets re-used if possible. The lack of this feature means that Adaptive Scheduler will always need to download the entire state from the checkpoint storage.
149149
- **No support for partial failover**: Partial failover means that the scheduler is able to restart parts ("regions" in Flink's internals) of a failed job, instead of the entire job. This limitation impacts only recovery time of embarrassingly parallel jobs: Flink's default scheduler can restart failed parts, while Adaptive Scheduler will restart the entire job.
150150
- **Limited integration with Flink's Web UI**: Adaptive Scheduler allows that a job's parallelism can change over its lifetime. The web UI only shows the current parallelism the job.
151-
- **Limited Job metrics**: With the exception of `numRestarts` all [availability]({{< ref "docs/ops/metrics" >}}#availability) and [checkpointing]({{< ref "docs/ops/metrics" >}}#checkpointing) metrics with the `Job` scope are not working correctly.
152151
- **Unused slots**: If the max parallelism for slot sharing groups is not equal, slots offered to Adaptive Scheduler might be unused.
153152
- Scaling events trigger job and task restarts, which will increase the number of Task attempts.
154153

docs/content/docs/ops/metrics.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1051,10 +1051,6 @@ Metrics related to data exchange between task executors using netty network comm
10511051

10521052
### Availability
10531053

1054-
{{< hint warning >}}
1055-
If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode) is enabled then these metrics, except `numRestarts`, do not work correctly.
1056-
{{< /hint >}}
1057-
10581054
<table class="table table-bordered">
10591055
<thead>
10601056
<tr>
@@ -1102,10 +1098,6 @@ If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode)
11021098

11031099
### Checkpointing
11041100

1105-
{{< hint warning >}}
1106-
If [Reactive Mode]({{< ref "docs/deployment/elastic_scaling" >}}#reactive-mode) is enabled then checkpointing metrics with the `Job` scope do not work correctly.
1107-
{{< /hint >}}
1108-
11091101
Note that for failed checkpoints, metrics are updated on a best efforts basis and may be not accurate.
11101102
<table class="table table-bordered">
11111103
<thead>

0 commit comments

Comments
 (0)