-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats #16128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #69605 has finished for PR 16128 at commit
|
| OffsetSeq( | ||
| offsets = Seq(Some(SerializedOffset( | ||
| """ | ||
| |{"kafka-topic":{"23":0,"8":1,"17":1,"11":1,"20":0,"2":6,"5":2,"14":0,"4":4,"13":1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be revised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a test for Kafka source to check if we can convert json to KafkaSourceOffset. This doesn't check that.
|
Test build #69609 has finished for PR 16128 at commit
|
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Just some nits.
| } | ||
| } | ||
| val partitions = partitionOffsets.keySet.toSeq.sorted // sort for more determinism | ||
| partitions.foreach { tp => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can use partitionOffsets.toSeq.sortBy(_._1).foreach { case (tp, off) => to simplify the codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to sort by topic and partitions together. so that partitions are ordered when json is generated (currently is not) and hard to read.
| } | ||
|
|
||
| test("read Spark 2.1.0 log format") { | ||
| val offset = readFromResource("kafka-source-offset-version-2.1.0.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe not need to read json from a file since we never write them into a single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. but its good to have it in a separate file in the same place as other formats. will be easier to track all the things that need compatibility guarantees.
| } | ||
|
|
||
| test("FileStreamSource offset - read Spark 2.1.0 log format") { | ||
| val offset = readOffsetFromResource("file-source-offset-version-2.1.0.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe not need to read json from a file since we never write them into a single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above.
|
Test build #69705 has finished for PR 16128 at commit
|
|
LGTM pending tests. |
|
Test build #3468 has finished for PR 16128 at commit
|
|
Test build #69714 has started for PR 16128 at commit |
|
retest this please |
|
Test build #69740 has finished for PR 16128 at commit
|
|
LGTM. Thanks. Merging to master and 2.1. |
…tructured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog. ## How was this patch tested? new unit tests Author: Tathagata Das <[email protected]> Closes #16128 from tdas/SPARK-18671. (cherry picked from commit 1ef6b29) Signed-off-by: Shixiong Zhu <[email protected]>
…tructured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog. ## How was this patch tested? new unit tests Author: Tathagata Das <[email protected]> Closes apache#16128 from tdas/SPARK-18671.
…tructured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog. ## How was this patch tested? new unit tests Author: Tathagata Das <[email protected]> Closes apache#16128 from tdas/SPARK-18671.
What changes were proposed in this pull request?
To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.
How was this patch tested?
new unit tests