Skip to content

Commit 4fb6076

Browse files
committed
Problem explanation changed
1 parent 9750d85 commit 4fb6076

File tree

3 files changed

+15
-11
lines changed

3 files changed

+15
-11
lines changed

_posts/2022-04-05-buffer-debloat.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,31 +9,35 @@ excerpt: Apache Flink adjust buffer size automatically in order to keep a balanc
99
---
1010

1111
## What is this article about?
12-
One of the most important features of Flink is providing a streaming experience with maximum possible throughput and minimum possible overhead.
12+
One of the most important features of Flink is providing a streaming experience with maximum possible throughput and minimum possible memory overhead.
1313
What does it actually mean? Let’s take a look at an ideal scenario:
1414

15-
Suppose we have the job and environment that provide the one and constant processing time among all subtasks and zero delays(network, processing) and no connection issues.
15+
Suppose we have the job and environment that provide the constant time for processing and sending data.
1616
In this case, the job processing looks like this:
1717

18-
While the downstream is processing record1, record2 is sent via the network, and record3 is processed by the upstream.
19-
As soon as the downstream has processed record1, record2 is ready to be processed, and so on.
20-
As we see here, the subtasks are always busy which guarantees us the maximum throughput and at the same time,
21-
It also doesn't use a lot of extra memory since it takes the record for processing as soon as it arrived.
18+
<div class="row front-graphic">
19+
<img src="{{ site.baseurl }}/img/blog/2022-03-28-buffer-debloat/ideal_case.gif"/>
20+
</div>
21+
22+
As the picture shows, there is no delay between processing the two neighbor records since processing and sending times are equal.
23+
This allows system to keep maximum possible throughput.
2224

23-
Unfortunately, it is obviously not reachable conditions in real life since all operators have different processing times(due to different logic, different record size, etc.),
25+
Unfortunately, it is obviously not reachable conditions in real life since all operators have different processing times(due to different logic, different record size, etc.),
2426
and there are also different types of issues with the environment(network, server, software) that can lead to unpredictable delays.
25-
As result, it is not trivial to support the maximum possible throughput. It is why Flink implements different approaches to level out different types of instabilities.
2627

27-
This article explains what is the buffer debloating feature and how it can help to reach the optimal balance between throughput and overhead.
28+
<div class="row front-graphic">
29+
<img src="{{ site.baseurl }}/img/blog/2022-03-28-buffer-debloat/simple_problem.gif"/>
30+
</div>
2831

29-
But before we look at details let’s remember how Flink transfers the data between subtasks.
32+
As result, it is not trivial to support the maximum possible throughput due to different types of delays.
3033

34+
This article explains how Flink use different approaches to level out different types of instabilities and minimize idleness of operator in order to keep the highest possible throughput.
3135

3236
## Network stack
3337

3438
A detailed explanation of the network stack can be found in the earlier [Flink's Network Stack](https://flink.apache.org/2019/06/05/flink-network-stack.html) block post.
3539

36-
Here we just recall a couple of important things.
40+
Here we just recall a couple of important things which explains how they minimize idleness of operators.
3741

3842
### Network buffer
3943

164 KB
Loading
164 KB
Loading

0 commit comments

Comments
 (0)