-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22781][SS] Support creating streaming dataset with ORC files #19975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84892 has finished for PR 19975 at commit
|
|
Test build #84893 has finished for PR 19975 at commit
|
|
Test build #84901 has finished for PR 19975 at commit
|
|
Also, @brkyvz . Could you review this PR? |
|
This LGTM. @zsxwing Any other comments? |
|
Thank you so much, @brkyvz ! |
|
LGTM. Let's trigger a new build since it's 5 days old now. retest this please. |
|
Thank you so much, @zsxwing ! |
|
Retest this please. |
|
Test build #85078 has finished for PR 19975 at commit
|
|
Retest this please |
|
Test build #85098 has finished for PR 19975 at commit
|
|
Retest this please |
|
Test build #85117 has finished for PR 19975 at commit
|
|
Gentle ping! :) |
|
Thanks! Merging to master! |
What changes were proposed in this pull request?
Like
Parquet, users can useORCwith Apache Spark structured streaming. This PR addsorc()toDataStreamReader(Scala/Python) in order to support creating streaming dataset with ORC file format more easily like the other file formats. Also, this adds a test coverage for ORC data source and updates the document.BEFORE
AFTER
How was this patch tested?
Pass the newly added test cases.