-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide #20631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@zsxwing can you take a look? |
|
Test build #87515 has finished for PR 20631 at commit
|
| .trigger(continuous='1 second') | ||
| .start() | ||
|
|
||
| {% endhighlight %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R examples:
# Default trigger (runs micro-batch as soon as it can)
write.stream(df, "console")
# ProcessingTime trigger with two-second micro-batch interval
write.stream(df, "console", trigger.processingTime = "2 seconds")
# One-time trigger
write.stream(df, "console", trigger.once = TRUE)
# Continuous trigger is not yet supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
Can you add support for continuous trigger in R APIs?
|
Test build #87542 has finished for PR 20631 at commit
|
|
Yes! That’s the plan
|
|
|
||
| - However, the guarantee is strict only in one direction. Data delayed by more than 2 hours is | ||
| not guaranteed to be dropped; it may or may not get aggregated. More delayed is the data, less | ||
| likely is the engine going to process it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the guarantee is strict only in one direction. Data delayed by more than 2 hours is not guaranteed to be dropped
This might contradict an earlier statement, from "Handling Late Data and Watermarking", that says
"In other words, late data within the threshold will be aggregated, but data later than the threshold will be dropped"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. let me fix it.
|
Test build #87561 has finished for PR 20631 at commit
|
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some nits
| <td> | ||
| The query will execute *only one* micro-batch to process all the available data and then | ||
| stop on its own. This is useful in scenarios you want to periodically spin up a cluster, | ||
| process everything that is available since the last period, and then the shutdown the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: then the shutdown
| .format("console") | ||
| .start() | ||
|
|
||
| // ProcessingTime trigger with two-second micro-batch interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: two-seconds
| // Continuous trigger with one-second checkpointing interval | ||
| df.writeStream | ||
| .format("console") | ||
| .trigger(Trigger.Continuous()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trigger.Continuous() -> Trigger.Continuous("1 second")
| .format("console") | ||
| .start(); | ||
|
|
||
| // ProcessingTime trigger with two-second micro-batch interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| // Continuous trigger with one-second checkpointing interval | ||
| df.writeStream | ||
| .format("console") | ||
| .trigger(Trigger.Continuous()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| .format("console") \ | ||
| .start() | ||
|
|
||
| # ProcessingTime trigger with two-second micro-batch interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
LGTM |
|
Test build #87570 has finished for PR 20631 at commit
|
…treaming programming guide ## What changes were proposed in this pull request? - Added clear information about triggers - Made the semantics guarantees of watermarks more clear for streaming aggregations and stream-stream joins. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <[email protected]> Closes #20631 from tdas/SPARK-23454. (cherry picked from commit 601d653) Signed-off-by: Tathagata Das <[email protected]>
…treaming programming guide ## What changes were proposed in this pull request? - Added clear information about triggers - Made the semantics guarantees of watermarks more clear for streaming aggregations and stream-stream joins. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <[email protected]> Closes apache#20631 from tdas/SPARK-23454. (cherry picked from commit 601d653) Signed-off-by: Tathagata Das <[email protected]>
What changes were proposed in this pull request?
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.