-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values #36672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fetch latest changes from master
fetch latest changes from master
|
Synced latest changes from master, this PR no longer depends on any other unmerged PRs anymore |
|
cc @sadikovi too FYI |
|
Can one of the admins verify this patch? |
...n/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
Outdated
Show resolved
Hide resolved
|
Also, can we add a test to check that the DEFAULT values work? Thanks. |
@sadikovi Sure, this is done in |
sadikovi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for updating the test.
|
The test log is a bit messy .. just copying and pasting the error I saw: |
|
@HyukjinKwon the CI passes now :) |
|
@gengliangwang @HyukjinKwon @cloud-fan can someone please merge this in (or leave more review comment(s) if desired for another pass)? |
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
### What changes were proposed in this pull request? Support vectorized Orc scans when the table schema has associated DEFAULT column values. (Note, this PR depends on #36672 which adds the same for Parquet files.) Example: ``` create table t(i int) using orc; insert into t values(42); alter table t add column s string default concat('abc', def'); select * from t; > 42, 'abcdef' ``` ### Why are the changes needed? This change makes it easier to build, query, and maintain tables backed by Orc data. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? This PR includes new test coverage. Closes #36675 from dtenedor/default-orc-vectorized. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Support vectorized Parquet scans when the table schema has associated DEFAULT column values.
Example:
Why are the changes needed?
This change makes it easier to build, query, and maintain tables backed by Parquet data.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
This PR includes new test coverage.