-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to be called value #10515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ame to be called value
|
@marmbrus Thanks Michael for your feedback! Looks like the 'value' is to give the single string column a arbitrary name. Current implementation strips schema information when creating TextRelation (after verifying the schema is single field with string type). It is fine during read, but fails during write. Would you mind taking another look at my updated change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should be updated too.
|
Can you add some tests for this? |
|
Thanks @viirya ! I have updated the comment and added unit test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just change the existing test case to rename the dataframe column and leave the following as a comment there?
SPARK-12562 verify write.text() can handle column name beyond value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin I thought about it, but was not sure if it was a good idea to change the existing testcase. In the existing test, should I add a second dataframe with column renamed, or just replace the original dataframe with column renaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just replace the original one to something weird, like "adwrasdf"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be failed because the later method verifyFrame will check if the df read in has schema like new StructType().add("value", StringType). You could update verifyFrame to check if it has only one StringType column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After write.text(), the local text file actually does not carry the schema name like JSON does. When reading back the text file and then call verifyFrame, it will always have value as the column name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. it is right.
|
@marmbrus Can we trigger a test for this? |
|
Test build #2309 has finished for PR 10515 at commit
|
|
Thanks - I've merged this. |
…ame to be called value Author: Xiu Guo <[email protected]> Closes #10515 from xguo27/SPARK-12562. (cherry picked from commit 84f8492) Signed-off-by: Reynold Xin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make sure that textSchema is a struct type that has only one string field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan DefaultSource.scala is the only place that creates a TextRelation, and it verifies that the schema is size 1 and of type string before creating a TextRelation. So I think it is fine not to verify again here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, then it's fine
No description provided.