-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-48241][SQL] CSV parsing failure with char/varchar type columns #46537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @ulysses-you Could you help review? |
| case a: AttributeReference => a | ||
| case a: AttributeReference => | ||
| // Keep the metadata in given schema. | ||
| a.copy(metadata = field.metadata)(exprId = a.exprId, qualifier = a.qualifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.withMetadata(field.metadata)
ulysses-you
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if tests pass, cc @yaooqinn @cloud-fan
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
|
thanks, merging to master/ |
|
it has conflicts with 3.5, can you create a new backport PR? |
Create a backport PR in #46565. |
What changes were proposed in this pull request?
CSV table containing char and varchar columns will result in the following error when selecting from the CSV table:
Why are the changes needed?
For char and varchar types, Spark will convert them to
StringTypeinCharVarcharUtils.replaceCharVarcharWithStringInSchemaand record__CHAR_VARCHAR_TYPE_STRINGin the metadata.The reason for the above error is that the
StringTypecolumns in thedataSchemaandrequiredSchemaofUnivocityParserare not consistent. TheStringTypein thedataSchemahas metadata, while the metadata in therequiredSchemais empty. We need to retain the metadata when resolving schema.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Add a new test case in
CSVSuite.Was this patch authored or co-authored using generative AI tooling?
No.