Skip to content

Conversation

@JDrit
Copy link
Contributor

@JDrit JDrit commented Aug 5, 2015

This wraps around the configuration code needed to read Generic, Specific, and Reflect Avro records. This allows a easy way to read Avro records without requiring the user to configure it themselves.

@marmbrus
Copy link
Contributor

marmbrus commented Sep 3, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Sep 3, 2015

Test build #41984 has finished for PR 7971 at commit 9218669.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the withTempDir method defined in OrcPartitionDiscoverySuite. It's causing compilation error since an override is missing there.

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42000 has finished for PR 7971 at commit bc8f2be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to also check the actual contents of read Avro records.

While trying to add such a test case, I found that string fields in Avro files are always materialized as Avro Utf8, even if I called

GenericData.setStringType(schema, GenericData.StringType.String)

Can we make this behavior configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want to register Utf8 to Kryo by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add Utf8 to the Kryo serializer list of classes to register.

@liancheng
Copy link
Contributor

Overall this PR looks pretty good. We might want to add build time Avro Java source generation for both SBT and Maven to remove those generated Java files (together with those added for testing parquet-avro compatibility). But this can be addressed in separate PRs.

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42431 has finished for PR 7971 at commit 25494f2.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42499 has finished for PR 7971 at commit 386b8db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 16, 2015

Test build #42536 has finished for PR 7971 at commit 8ba8e72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2015

Test build #42664 has finished for PR 7971 at commit 65617fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

This LGTM now. But still need a Spark Core maintainer to have a look at this. Thanks for working on this!

@liancheng
Copy link
Contributor

cc @pwendell

@rxin
Copy link
Contributor

rxin commented Sep 22, 2015

I'm tempted to not merge this - since it is something that can easily be done by the user, and it does introduce strong avro dependency in the API itself.

@yhuai
Copy link
Contributor

yhuai commented Jan 13, 2016

@JDrit How about we close this PR (and the jira) for now? We can revisit it later if users have a high demand on this. Thanks!

@asfgit asfgit closed this in 085f510 Feb 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants