You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type columns
### What changes were proposed in this pull request?
CSV table containing char and varchar columns will result in the following error when selecting from the CSV table:
```
spark-sql (default)> show create table test_csv;
CREATE TABLE default.test_csv (
id INT,
name CHAR(10))
USING csv
```
```
java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<id:int,name:string>) should be the subset of dataSchema (struct<id:int,name:string>).
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
```
### Why are the changes needed?
For char and varchar types, Spark will convert them to `StringType` in `CharVarcharUtils.replaceCharVarcharWithStringInSchema` and record `__CHAR_VARCHAR_TYPE_STRING` in the metadata.
The reason for the above error is that the `StringType` columns in the `dataSchema` and `requiredSchema` of `UnivocityParser` are not consistent. The `StringType` in the `dataSchema` has metadata, while the metadata in the `requiredSchema` is empty. We need to retain the metadata when resolving schema.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add a new test case in `CSVSuite`.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#46565 from liujiayi771/branch-3.5-SPARK-48241.
Authored-by: joey.ljy <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments