Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Dec 3, 2016

What changes were proposed in this pull request?

To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.

How was this patch tested?

new unit tests

@SparkQA
Copy link

SparkQA commented Dec 3, 2016

Test build #69605 has finished for PR 16128 at commit 49e940b.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

OffsetSeq(
offsets = Seq(Some(SerializedOffset(
"""
|{"kafka-topic":{"23":0,"8":1,"17":1,"11":1,"20":0,"2":6,"5":2,"14":0,"4":4,"13":1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be revised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for Kafka source to check if we can convert json to KafkaSourceOffset. This doesn't check that.

@tdas tdas changed the title [SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats [WIP][SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats Dec 3, 2016
@SparkQA
Copy link

SparkQA commented Dec 3, 2016

Test build #69609 has finished for PR 16128 at commit d9be1c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas tdas changed the title [WIP][SPARK-18671][SS] Added tests to ensure stability of that all Structured Streaming log formats [WIP][SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats Dec 6, 2016
@tdas tdas changed the title [WIP][SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats Dec 6, 2016
Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Just some nits.

}
}
val partitions = partitionOffsets.keySet.toSeq.sorted // sort for more determinism
partitions.foreach { tp =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use partitionOffsets.toSeq.sortBy(_._1).foreach { case (tp, off) => to simplify the codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to sort by topic and partitions together. so that partitions are ordered when json is generated (currently is not) and hard to read.

}

test("read Spark 2.1.0 log format") {
val offset = readFromResource("kafka-source-offset-version-2.1.0.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe not need to read json from a file since we never write them into a single file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. but its good to have it in a separate file in the same place as other formats. will be easier to track all the things that need compatibility guarantees.

}

test("FileStreamSource offset - read Spark 2.1.0 log format") {
val offset = readOffsetFromResource("file-source-offset-version-2.1.0.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe not need to read json from a file since we never write them into a single file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69705 has finished for PR 16128 at commit 8d4ca5e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 6, 2016

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #3468 has finished for PR 16128 at commit 8d4ca5e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69714 has started for PR 16128 at commit 26a86d6.

@zsxwing
Copy link
Member

zsxwing commented Dec 6, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69740 has finished for PR 16128 at commit 26a86d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 6, 2016

LGTM. Thanks. Merging to master and 2.1.

asfgit pushed a commit that referenced this pull request Dec 6, 2016
…tructured Streaming log formats

## What changes were proposed in this pull request?

To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.

## How was this patch tested?
new unit tests

Author: Tathagata Das <[email protected]>

Closes #16128 from tdas/SPARK-18671.

(cherry picked from commit 1ef6b29)
Signed-off-by: Shixiong Zhu <[email protected]>
@asfgit asfgit closed this in 1ef6b29 Dec 6, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…tructured Streaming log formats

## What changes were proposed in this pull request?

To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.

## How was this patch tested?
new unit tests

Author: Tathagata Das <[email protected]>

Closes apache#16128 from tdas/SPARK-18671.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…tructured Streaming log formats

## What changes were proposed in this pull request?

To be able to restart StreamingQueries across Spark version, we have already made the logs (offset log, file source log, file sink log) use json. We should added tests with actual json files in the Spark such that any incompatible changes in reading the logs is immediately caught. This PR add tests for FileStreamSourceLog, FileStreamSinkLog, and OffsetSeqLog.

## How was this patch tested?
new unit tests

Author: Tathagata Das <[email protected]>

Closes apache#16128 from tdas/SPARK-18671.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants