-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9652][CORE] Added method for Avro file input #7971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #41984 has finished for PR 7971 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the withTempDir method defined in OrcPartitionDiscoverySuite. It's causing compilation error since an override is missing there.
|
Test build #42000 has finished for PR 7971 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to also check the actual contents of read Avro records.
While trying to add such a test case, I found that string fields in Avro files are always materialized as Avro Utf8, even if I called
GenericData.setStringType(schema, GenericData.StringType.String)Can we make this behavior configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably also want to register Utf8 to Kryo by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add Utf8 to the Kryo serializer list of classes to register.
|
Overall this PR looks pretty good. We might want to add build time Avro Java source generation for both SBT and Maven to remove those generated Java files (together with those added for testing parquet-avro compatibility). But this can be addressed in separate PRs. |
|
Test build #42431 has finished for PR 7971 at commit
|
|
Test build #42499 has finished for PR 7971 at commit
|
|
Test build #42536 has finished for PR 7971 at commit
|
|
Test build #42664 has finished for PR 7971 at commit
|
|
This LGTM now. But still need a Spark Core maintainer to have a look at this. Thanks for working on this! |
|
cc @pwendell |
|
I'm tempted to not merge this - since it is something that can easily be done by the user, and it does introduce strong avro dependency in the API itself. |
|
@JDrit How about we close this PR (and the jira) for now? We can revisit it later if users have a high demand on this. Thanks! |
This wraps around the configuration code needed to read Generic, Specific, and Reflect Avro records. This allows a easy way to read Avro records without requiring the user to configure it themselves.