[SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats #16128

tdas · 2016-12-03T02:55:11Z

What changes were proposed in this pull request?

To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.

How was this patch tested?

new unit tests

SparkQA · 2016-12-03T02:58:52Z

Test build #69605 has finished for PR 16128 at commit 49e940b.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-12-03T06:49:58Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLogSuite.scala

+      OffsetSeq(
+        offsets = Seq(Some(SerializedOffset(
+          """
+            |{"kafka-topic":{"23":0,"8":1,"17":1,"11":1,"20":0,"2":6,"5":2,"14":0,"4":4,"13":1,


This needs to be revised.

We should add a test for Kafka source to check if we can convert json to KafkaSourceOffset. This doesn't check that.

SparkQA · 2016-12-03T07:49:41Z

Test build #69609 has finished for PR 16128 at commit d9be1c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

LGTM overall. Just some nits.

zsxwing · 2016-12-06T01:47:25Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala

+      }
+    }
+    val partitions = partitionOffsets.keySet.toSeq.sorted  // sort for more determinism
+    partitions.foreach { tp =>


nit: You can use partitionOffsets.toSeq.sortBy(_._1).foreach { case (tp, off) => to simplify the codes.

I want to sort by topic and partitions together. so that partitions are ordered when json is generated (currently is not) and hard to read.

zsxwing · 2016-12-06T01:52:44Z

...nal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceOffsetSuite.scala

  }
+
+  test("read Spark 2.1.0 log format") {
+    val offset = readFromResource("kafka-source-offset-version-2.1.0.txt")


nit: maybe not need to read json from a file since we never write them into a single file.

yeah. but its good to have it in a separate file in the same place as other formats. will be easier to track all the things that need compatibility guarantees.

zsxwing · 2016-12-06T01:53:01Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala

  }
+
+  test("FileStreamSource offset - read Spark 2.1.0 log format") {
+    val offset = readOffsetFromResource("file-source-offset-version-2.1.0.txt")


nit: maybe not need to read json from a file since we never write them into a single file.

same comment as above.

SparkQA · 2016-12-06T03:37:38Z

Test build #69705 has finished for PR 16128 at commit 8d4ca5e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-12-06T05:06:09Z

LGTM pending tests.

SparkQA · 2016-12-06T05:57:18Z

Test build #3468 has finished for PR 16128 at commit 8d4ca5e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-06T06:12:36Z

Test build #69714 has started for PR 16128 at commit 26a86d6.

zsxwing · 2016-12-06T18:31:29Z

retest this please

SparkQA · 2016-12-06T21:01:38Z

Test build #69740 has finished for PR 16128 at commit 26a86d6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-12-06T21:04:33Z

LGTM. Thanks. Merging to master and 2.1.

…tructured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog. ## How was this patch tested? new unit tests Author: Tathagata Das <[email protected]> Closes #16128 from tdas/SPARK-18671. (cherry picked from commit 1ef6b29) Signed-off-by: Shixiong Zhu <[email protected]>

…tructured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog. ## How was this patch tested? new unit tests Author: Tathagata Das <[email protected]> Closes apache#16128 from tdas/SPARK-18671.

tdas added 3 commits December 2, 2016 17:48

Added test for FileStremaSinkLog

84efce9

Added test for FileStreamSourceLog

3d54494

Added test for OffsetSeqLog

49e940b

Added rat excludes

d9be1c5

tdas commented Dec 3, 2016

View reviewed changes

tdas changed the title ~~[SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats~~ [WIP][SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats Dec 3, 2016

tdas added 3 commits December 5, 2016 15:55

Improved offset seq test

4150e56

Added kafka offset test

a7529b3

Added test for file stream source offset

8d4ca5e

tdas changed the title ~~[WIP][SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats~~ [WIP][SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats Dec 6, 2016

tdas changed the title ~~[WIP][SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats~~ [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats Dec 6, 2016

zsxwing reviewed Dec 6, 2016

View reviewed changes

tdas added 2 commits December 5, 2016 22:06

Merge remote-tracking branch 'apache-github/master' into SPARK-18671

e3a7422

Fix unit test

26a86d6

asfgit closed this in 1ef6b29 Dec 6, 2016

[SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats #16128

[SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats #16128

Uh oh!

Conversation

tdas commented Dec 3, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Dec 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 3, 2016

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2016

Uh oh!

zsxwing commented Dec 6, 2016

Uh oh!

SparkQA commented Dec 6, 2016

Uh oh!

SparkQA commented Dec 6, 2016

Uh oh!

zsxwing commented Dec 6, 2016

Uh oh!

SparkQA commented Dec 6, 2016

Uh oh!

zsxwing commented Dec 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants