[SPARK-22781][SS] Support creating streaming dataset with ORC files #19975

dongjoon-hyun · 2017-12-14T04:18:25Z

What changes were proposed in this pull request?

Like Parquet, users can use ORC with Apache Spark structured streaming. This PR adds orc() to DataStreamReader(Scala/Python) in order to support creating streaming dataset with ORC file format more easily like the other file formats. Also, this adds a test coverage for ORC data source and updates the document.

BEFORE

scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()
<console>:24: error: value orc is not a member of org.apache.spark.sql.streaming.DataStreamReader
       spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()

AFTER

scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()
res0: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@678b3746

scala>
-------------------------------------------
Batch: 0
-------------------------------------------
+---+
|  a|
+---+
|  1|
+---+

How was this patch tested?

Pass the newly added test cases.

…rmat

SparkQA · 2017-12-14T07:03:46Z

Test build #84892 has finished for PR 19975 at commit 66475b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-14T07:16:09Z

Test build #84893 has finished for PR 19975 at commit 1ab78ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-14T11:04:55Z

Test build #84901 has finished for PR 19975 at commit 29094cc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-12-14T15:56:50Z

Hi, @tdas and @zsxwing .
Could you review this PR?

dongjoon-hyun · 2017-12-15T02:56:45Z

Also, @brkyvz . Could you review this PR?

dongjoon-hyun · 2017-12-18T20:34:07Z

Hi, @tdas , @zsxwing , @brkyvz .
Could you give me some advice about how to proceed this PR?

brkyvz · 2017-12-18T20:49:34Z

This LGTM. @zsxwing Any other comments?

dongjoon-hyun · 2017-12-18T20:55:07Z

Thank you so much, @brkyvz !

zsxwing · 2017-12-18T22:52:54Z

LGTM. Let's trigger a new build since it's 5 days old now.

retest this please.

dongjoon-hyun · 2017-12-18T23:41:28Z

Thank you so much, @zsxwing !

dongjoon-hyun · 2017-12-18T23:41:37Z

Retest this please.

SparkQA · 2017-12-19T01:21:58Z

Test build #85078 has finished for PR 19975 at commit 29094cc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-12-19T07:04:54Z

Retest this please

SparkQA · 2017-12-19T08:05:01Z

Test build #85098 has finished for PR 19975 at commit 29094cc.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-12-19T16:52:02Z

Retest this please

SparkQA · 2017-12-19T19:35:35Z

Test build #85117 has finished for PR 19975 at commit 29094cc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-12-19T21:14:35Z

Hi, @brkyvz and @zsxwing . It passed the Jenkins again finally.

dongjoon-hyun · 2017-12-20T05:57:46Z

Gentle ping! :)

zsxwing · 2017-12-20T07:49:17Z

Thanks! Merging to master!

dongjoon-hyun · 2017-12-20T15:34:15Z

Thank you so much, @zsxwing and @brkyvz

dongjoon-hyun added 3 commits December 13, 2017 20:15

[SPARK-22781][SS] Support creating streaming dataset with ORC file fo…

66475b7

…rmat

Add a ORC test case into FileStreamSinkSuite, too.

1ab78ed

Add a supported option.

d595aeb

Add python api, too.

29094cc

asfgit closed this in 9962390 Dec 20, 2017

dongjoon-hyun deleted the SPARK-22781 branch December 20, 2017 15:34

[SPARK-22781][SS] Support creating streaming dataset with ORC files #19975

[SPARK-22781][SS] Support creating streaming dataset with ORC files #19975

Uh oh!

Conversation

dongjoon-hyun commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Dec 14, 2017

Uh oh!

SparkQA commented Dec 14, 2017

Uh oh!

SparkQA commented Dec 14, 2017

Uh oh!

dongjoon-hyun commented Dec 14, 2017

Uh oh!

dongjoon-hyun commented Dec 15, 2017

Uh oh!

dongjoon-hyun commented Dec 18, 2017

Uh oh!

brkyvz commented Dec 18, 2017

Uh oh!

dongjoon-hyun commented Dec 18, 2017

Uh oh!

zsxwing commented Dec 18, 2017

Uh oh!

dongjoon-hyun commented Dec 18, 2017

Uh oh!

dongjoon-hyun commented Dec 18, 2017

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

dongjoon-hyun commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

dongjoon-hyun commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

dongjoon-hyun commented Dec 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Dec 20, 2017

Uh oh!

zsxwing commented Dec 20, 2017

Uh oh!

dongjoon-hyun commented Dec 20, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun commented Dec 14, 2017 •

edited

Loading

dongjoon-hyun commented Dec 19, 2017 •

edited

Loading