Commit c2ce1be
[SPARK-18475] Be able to increase parallelism in StructuredStreaming Kafka source
## What changes were proposed in this pull request?
This PR adds the configuration `numPartitions` to the StructuredStreaming Kafka Source. Setting this value to a value higher than the number of `TopicPartitions` that you're going to consume will allow Spark to have multiple tasks reading from the same `TopicPartition` allowing users to handle skewed partitions.
While the number of `TopicPartitions` could be dynamic from batch to batch, e.g. you may delete/create topics, in ETL use cases where you generally have a set of static number of TopicPartitions, this configuration has been very useful.
If the `TopicPartitions` are dynamic, then we will always have a parallelism of `max(topicPartitions.length, numPartitions)`.
## How was this patch tested?
Unit tests. I used this on production data and it certainly helped in handling peak loads and skewed partitions.
Author: Burak Yavuz <[email protected]>
Closes apache#166 from brkyvz/kafka-par-split.1 parent f8bf2b0 commit c2ce1be
File tree
5 files changed
+307
-44
lines changed- external/kafka-0-10-sql/src
- main/scala/org/apache/spark/sql/kafka010
- test/scala/org/apache/spark/sql/kafka010
5 files changed
+307
-44
lines changedLines changed: 8 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
274 | | - | |
| 274 | + | |
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
| |||
334 | 334 | | |
335 | 335 | | |
336 | 336 | | |
337 | | - | |
| 337 | + | |
| 338 | + | |
338 | 339 | | |
339 | 340 | | |
340 | 341 | | |
341 | 342 | | |
342 | 343 | | |
343 | 344 | | |
344 | | - | |
| 345 | + | |
| 346 | + | |
345 | 347 | | |
346 | 348 | | |
347 | 349 | | |
348 | 350 | | |
349 | 351 | | |
350 | 352 | | |
351 | 353 | | |
| 354 | + | |
352 | 355 | | |
| 356 | + | |
| 357 | + | |
353 | 358 | | |
354 | 359 | | |
355 | 360 | | |
| |||
Lines changed: 93 additions & 31 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
122 | 129 | | |
123 | 130 | | |
124 | 131 | | |
| |||
279 | 286 | | |
280 | 287 | | |
281 | 288 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
311 | 294 | | |
312 | 295 | | |
313 | | - | |
314 | | - | |
| 296 | + | |
| 297 | + | |
315 | 298 | | |
316 | 299 | | |
317 | 300 | | |
| |||
393 | 376 | | |
394 | 377 | | |
395 | 378 | | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
396 | 458 | | |
397 | 459 | | |
398 | 460 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
Lines changed: 37 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
66 | | - | |
| 67 | + | |
| 68 | + | |
67 | 69 | | |
68 | 70 | | |
69 | 71 | | |
| |||
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
122 | 133 | | |
123 | 134 | | |
124 | 135 | | |
| |||
133 | 144 | | |
134 | 145 | | |
135 | 146 | | |
136 | | - | |
137 | | - | |
138 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
139 | 159 | | |
140 | 160 | | |
141 | 161 | | |
| |||
156 | 176 | | |
157 | 177 | | |
158 | 178 | | |
159 | | - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
160 | 190 | | |
| 191 | + | |
161 | 192 | | |
162 | 193 | | |
163 | 194 | | |
0 commit comments