You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2022-04-05-buffer-debloat.md
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,31 +9,35 @@ excerpt: Apache Flink adjust buffer size automatically in order to keep a balanc
9
9
---
10
10
11
11
## What is this article about?
12
-
One of the most important features of Flink is providing a streaming experience with maximum possible throughput and minimum possible overhead.
12
+
One of the most important features of Flink is providing a streaming experience with maximum possible throughput and minimum possible memory overhead.
13
13
What does it actually mean? Let’s take a look at an ideal scenario:
14
14
15
-
Suppose we have the job and environment that provide the one and constant processing time among all subtasks and zero delays(network, processing) and no connection issues.
15
+
Suppose we have the job and environment that provide the constant time for processing and sending data.
16
16
In this case, the job processing looks like this:
17
17
18
-
While the downstream is processing record1, record2 is sent via the network, and record3 is processed by the upstream.
19
-
As soon as the downstream has processed record1, record2 is ready to be processed, and so on.
20
-
As we see here, the subtasks are always busy which guarantees us the maximum throughput and at the same time,
21
-
It also doesn't use a lot of extra memory since it takes the record for processing as soon as it arrived.
As the picture shows, there is no delay between processing the two neighbor records since processing and sending times are equal.
23
+
This allows system to keep maximum possible throughput.
22
24
23
-
Unfortunately, it is obviously not reachable conditions in real life since all operators have different processing times(due to different logic, different record size, etc.),
25
+
Unfortunately, it is obviously not reachable conditions in real life since all operators have different processing times(due to different logic, different record size, etc.),
24
26
and there are also different types of issues with the environment(network, server, software) that can lead to unpredictable delays.
25
-
As result, it is not trivial to support the maximum possible throughput. It is why Flink implements different approaches to level out different types of instabilities.
26
27
27
-
This article explains what is the buffer debloating feature and how it can help to reach the optimal balance between throughput and overhead.
But before we look at details let’s remember how Flink transfers the data between subtasks.
32
+
As result, it is not trivial to support the maximum possible throughput due to different types of delays.
30
33
34
+
This article explains how Flink use different approaches to level out different types of instabilities and minimize idleness of operator in order to keep the highest possible throughput.
31
35
32
36
## Network stack
33
37
34
38
A detailed explanation of the network stack can be found in the earlier [Flink's Network Stack](https://flink.apache.org/2019/06/05/flink-network-stack.html) block post.
35
39
36
-
Here we just recall a couple of important things.
40
+
Here we just recall a couple of important things which explains how they minimize idleness of operators.
0 commit comments