Skip to content

Conversation

@zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Nov 12, 2015

…tion when saving

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45721 has finished for PR 9657 at commit 09d39e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45728 has finished for PR 9657 at commit 5269f4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 1, 2015

@Lewuathe Could you help review this ? This is a dependency issue for refactor CsvRelation to extend HadoopFsRelation. CsvRelation now support to write to compressed format while currently HadoopFsRelationd don't support that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third party modules should be put above org.apache.spark.* modules.
see: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports

@Lewuathe
Copy link
Contributor

Lewuathe commented Dec 2, 2015

@zjffdu Do you intend to remove the compression codev from CsvRelation after changing HadoopFsRelation and inherit it? Since spark-csv is a external library, it is not reasonable to change according to spark-csv requirements.

But anyway compression codec option in HadoopFsRelation is reasonable itself for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as DataFrameWriter import.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 2, 2015

@Lewuathe Yes, I'd like to make CsvRelation to extend HadoopFsRelation also after this change since currently CsvRelation support compression so want to keep its compatibility.
Although CsvRelation is an external library, csv is a format used very often. If it also extends HadoopFsRelation, then other third party library can use HadoopFsRelation to embrace lots of formats. Acutally SPARK-10038 is the case that need that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it written for debug? We can remove show.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will correct it.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 2, 2015

@Lewuathe Thanks for the review. I push another commit to address the comments. Besides I change the compression feature to 1.6.0.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47034 has finished for PR 9657 at commit 896440a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47036 has finished for PR 9657 at commit db31b8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 3, 2015

Never mind, I change back to 1.7.0 since 1.6 is in rc1

@SparkQA
Copy link

SparkQA commented Dec 3, 2015

Test build #47118 has finished for PR 9657 at commit 657ba5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of code changes should be small, so we do not need this additional interface.
If we add an interface for each additional option, the number of interfaces blows up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should not add interface for every configuration, but considering compression is a very common property, I feel it would be better to keep this interface. We also expose the compression api in RDD.saveAsXX, so think it would be better to be consistent here in dataframe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@maropu
Copy link
Member

maropu commented Jan 29, 2016

@zjffdu Any updates? If you keep working on it, please check my comments.

@zjffdu
Copy link
Contributor Author

zjffdu commented Jan 29, 2016

Will update this PR

@zjffdu
Copy link
Contributor Author

zjffdu commented Jan 29, 2016

@maropu Thanks for review, I update the PR to address part of your comments. Please check my comments inline.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50345 has finished for PR 9657 at commit ea70b40.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50347 has finished for PR 9657 at commit a8ed421.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 1.7.0, but 2.0.0.

@maropu
Copy link
Member

maropu commented Jan 29, 2016

@rxin @liancheng Could you review this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a normal option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we shouldn't depend on Hadoop APIs in options, which is a user facing API. Nobody outside the Hadoop world knows how to use the CompressionCodec API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50364 has finished for PR 9657 at commit 3645dbc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 30, 2016

@zjffdu ping

1 similar comment
@maropu
Copy link
Member

maropu commented Feb 2, 2016

@zjffdu ping

@zjffdu
Copy link
Contributor Author

zjffdu commented Feb 2, 2016

sorry for late response, will update the patch tomorrow.

@maropu
Copy link
Member

maropu commented Feb 9, 2016

@zjffdu ping

@maropu
Copy link
Member

maropu commented Feb 18, 2016

@zjffdu If you have no time to take this, is it okay I rework?

@maropu
Copy link
Member

maropu commented Feb 29, 2016

This is resolved by #11384, so could you close this?

@zjffdu zjffdu closed this Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants