Skip to content

Conversation

@xguo27
Copy link
Contributor

@xguo27 xguo27 commented Dec 29, 2015

No description provided.

@xguo27
Copy link
Contributor Author

xguo27 commented Dec 30, 2015

@marmbrus Thanks Michael for your feedback!

Looks like the 'value' is to give the single string column a arbitrary name. Current implementation strips schema information when creating TextRelation (after verifying the schema is single field with string type). It is fine during read, but fails during write.

Would you mind taking another look at my updated change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be updated too.

@viirya
Copy link
Member

viirya commented Dec 30, 2015

Can you add some tests for this?

@xguo27
Copy link
Contributor Author

xguo27 commented Dec 30, 2015

Thanks @viirya ! I have updated the comment and added unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you just change the existing test case to rename the dataframe column and leave the following as a comment there?

SPARK-12562 verify write.text() can handle column name beyond value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I thought about it, but was not sure if it was a good idea to change the existing testcase. In the existing test, should I add a second dataframe with column renamed, or just replace the original dataframe with column renaming?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just replace the original one to something weird, like "adwrasdf"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be failed because the later method verifyFrame will check if the df read in has schema like new StructType().add("value", StringType). You could update verifyFrame to check if it has only one StringType column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After write.text(), the local text file actually does not carry the schema name like JSON does. When reading back the text file and then call verifyFrame, it will always have value as the column name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it is right.

@xguo27
Copy link
Contributor Author

xguo27 commented Jan 4, 2016

@marmbrus Can we trigger a test for this?

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #2309 has finished for PR 10515 at commit d01ecac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 4, 2016

Thanks - I've merged this.

asfgit pushed a commit that referenced this pull request Jan 4, 2016
…ame to be called value

Author: Xiu Guo <[email protected]>

Closes #10515 from xguo27/SPARK-12562.

(cherry picked from commit 84f8492)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 84f8492 Jan 4, 2016
@xguo27 xguo27 deleted the SPARK-12562 branch January 4, 2016 04:56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make sure that textSchema is a struct type that has only one string field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan DefaultSource.scala is the only place that creates a TextRelation, and it verifies that the schema is size 1 and of type string before creating a TextRelation. So I think it is fine not to verify again here. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then it's fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants