Skip to content

Commit 63c72b9

Browse files
committed
[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure
Author: Tathagata Das <[email protected]> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure (cherry picked from commit 52b24a6) Signed-off-by: Tathagata Das <[email protected]>
1 parent 7fd4674 commit 63c72b9

File tree

2 files changed

+25
-1
lines changed

2 files changed

+25
-1
lines changed

docs/configuration.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1437,6 +1437,19 @@ Apart from these, the following properties are also available, and may be useful
14371437
#### Spark Streaming
14381438
<table class="table">
14391439
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
1440+
<tr>
1441+
<td><code>spark.streaming.backpressure.enabled</code></td>
1442+
<td>false</td>
1443+
<td>
1444+
Enables or disables Spark Streaming's internal backpressure mechanism (since 1.5).
1445+
This enables the Spark Streaming to control the receiving rate based on the
1446+
current batch scheduling delays and processing times so that the system receives
1447+
only as fast as the system can process. Internally, this dynamically sets the
1448+
maximum receiving rate of receivers. This rate is upper bounded by the values
1449+
`spark.streaming.receiver.maxRate` and `spark.streaming.kafka.maxRatePerPartition`
1450+
if they are set (see below).
1451+
</td>
1452+
</tr>
14401453
<tr>
14411454
<td><code>spark.streaming.blockInterval</code></td>
14421455
<td>200ms</td>

docs/streaming-programming-guide.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1807,7 +1807,7 @@ To run a Spark Streaming applications, you need to have the following.
18071807
+ *Mesos* - [Marathon](https://github.com/mesosphere/marathon) has been used to achieve this
18081808
with Mesos.
18091809

1810-
- *[Since Spark 1.2] Configuring write ahead logs* - Since Spark 1.2,
1810+
- *Configuring write ahead logs* - Since Spark 1.2,
18111811
we have introduced _write ahead logs_ for achieving strong
18121812
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
18131813
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
@@ -1822,6 +1822,17 @@ To run a Spark Streaming applications, you need to have the following.
18221822
stored in a replicated storage system. This can be done by setting the storage level for the
18231823
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
18241824

1825+
- *Setting the max receiving rate* - If the cluster resources is not large enough for the streaming
1826+
application to process data as fast as it is being received, the receivers can be rate limited
1827+
by setting a maximum rate limit in terms of records / sec.
1828+
See the [configuration parameters](configuration.html#spark-streaming)
1829+
`spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
1830+
for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that
1831+
eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
1832+
rate limits and dynamically adjusts them if the processing conditions change. This backpressure
1833+
can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
1834+
`spark.streaming.backpressure.enabled` to `true`.
1835+
18251836
### Upgrading Application Code
18261837
{:.no_toc}
18271838

0 commit comments

Comments
 (0)