[SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide #20631

tdas · 2018-02-17T02:10:57Z

What changes were proposed in this pull request?

Added clear information about triggers
Made the semantics guarantees of watermarks more clear for streaming aggregations and stream-stream joins.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

tdas · 2018-02-17T02:15:30Z

@zsxwing can you take a look?

SparkQA · 2018-02-17T02:25:55Z

Test build #87515 has finished for PR 20631 at commit 31bf653.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2018-02-18T06:23:56Z

docs/structured-streaming-programming-guide.md

+  .trigger(continuous='1 second')
+  .start()
+
+{% endhighlight %}


R examples:

# Default trigger (runs micro-batch as soon as it can) write.stream(df, "console") # ProcessingTime trigger with two-second micro-batch interval write.stream(df, "console", trigger.processingTime = "2 seconds") # One-time trigger write.stream(df, "console", trigger.once = TRUE) # Continuous trigger is not yet supported

Thank you!!
Can you add support for continuous trigger in R APIs?

SparkQA · 2018-02-19T09:29:57Z

Test build #87542 has finished for PR 20631 at commit 4086237.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2018-02-19T10:47:00Z

Yes! That’s the plan

bersprockets · 2018-02-19T18:20:48Z

docs/structured-streaming-programming-guide.md

+
+- However, the guarantee is strict only in one direction. Data delayed by more than 2 hours is
+not guaranteed to be dropped; it may or may not get aggregated. More delayed is the data, less
+likely is the engine going to process it.


However, the guarantee is strict only in one direction. Data delayed by more than 2 hours is not guaranteed to be dropped

This might contradict an earlier statement, from "Handling Late Data and Watermarking", that says

"In other words, late data within the threshold will be aggregated, but data later than the threshold will be dropped"

good catch. let me fix it.

SparkQA · 2018-02-20T19:40:03Z

Test build #87561 has finished for PR 20631 at commit 4f13e40.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

LGTM except some nits

zsxwing · 2018-02-20T19:45:15Z

docs/structured-streaming-programming-guide.md

+    <td>
+        The query will execute *only one* micro-batch to process all the available data and then
+        stop on its own. This is useful in scenarios you want to periodically spin up a cluster,
+        process everything that is available since the last period, and then the shutdown the


nit: then ~~the~~ shutdown

zsxwing · 2018-02-20T19:46:10Z

docs/structured-streaming-programming-guide.md

+  .format("console")
+  .start()
+
+// ProcessingTime trigger with two-second micro-batch interval


nit: two-seconds

zsxwing · 2018-02-20T19:47:12Z

docs/structured-streaming-programming-guide.md

+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Continuous())


Trigger.Continuous() -> Trigger.Continuous("1 second")

zsxwing · 2018-02-20T19:47:33Z

docs/structured-streaming-programming-guide.md

+  .format("console")
+  .start();
+
+// ProcessingTime trigger with two-second micro-batch interval


zsxwing · 2018-02-20T19:47:38Z

docs/structured-streaming-programming-guide.md

+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Continuous())


zsxwing · 2018-02-20T19:49:10Z

docs/structured-streaming-programming-guide.md

+  .format("console") \
+  .start()
+
+# ProcessingTime trigger with two-second micro-batch interval


zsxwing · 2018-02-20T23:17:03Z

LGTM

SparkQA · 2018-02-20T23:30:53Z

Test build #87570 has finished for PR 20631 at commit 6ad07d8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…treaming programming guide ## What changes were proposed in this pull request? - Added clear information about triggers - Made the semantics guarantees of watermarks more clear for streaming aggregations and stream-stream joins. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <[email protected]> Closes #20631 from tdas/SPARK-23454. (cherry picked from commit 601d653) Signed-off-by: Tathagata Das <[email protected]>

…treaming programming guide ## What changes were proposed in this pull request? - Added clear information about triggers - Made the semantics guarantees of watermarks more clear for streaming aggregations and stream-stream joins. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <[email protected]> Closes apache#20631 from tdas/SPARK-23454. (cherry picked from commit 601d653) Signed-off-by: Tathagata Das <[email protected]>

Improved docs

31bf653

felixcheung reviewed Feb 18, 2018

View reviewed changes

Added R code

4086237

bersprockets reviewed Feb 19, 2018

View reviewed changes

Addressed comments

4f13e40

zsxwing reviewed Feb 20, 2018

View reviewed changes

Addressed comments

6ad07d8

asfgit closed this in 601d653 Feb 21, 2018

[SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide #20631

[SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide #20631

Uh oh!

Conversation

tdas commented Feb 17, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

tdas commented Feb 17, 2018

Uh oh!

SparkQA commented Feb 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 19, 2018

Uh oh!

felixcheung commented Feb 19, 2018 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 20, 2018

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Feb 20, 2018

Uh oh!

SparkQA commented Feb 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants