[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true #44481

LuciferYang · 2023-12-25T10:10:49Z

What changes were proposed in this pull request?

This pr aims to change the test inputs in ParquetTypeWideningSuite to valid int to fix failed test in ParquetTypeWideningSuite when SPARK_ANSI_SQL_MODE` is set to true

Why are the changes needed?

Fix the day test failure when SPARK_ANSI_SQL_MODE is set to true.

[info] - unsupported parquet conversion IntegerType -> TimestampType *** FAILED *** (68 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 261.0 failed 1 times, most recent failure: Lost task 1.0 in stage 261.0 (TID 523) (localhost executor driver): org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '1.23' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22018
[info] == DataFrame ==
[info] "cast" was called from
[info] org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113)
[info] 
[info] 	at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils.toIntExact(UTF8StringUtils.scala)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
[info] 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] 	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
[info] 	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:388)
[info] 	at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:101)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
[info] 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[info] 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
[info] 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[info] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] 	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] 	at org.apache.spark.scheduler.Task.run(Task.scala:141)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:628)
[info] 	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] 	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:631)
[info] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] 	at java.base/java.lang.Thread.run(Thread.java:840)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions
Manual check

SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite"

Before

[info] Run completed in 27 seconds, 432 milliseconds.
[info] Total number of tests run: 34
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]         org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite

After

[info] Run completed in 28 seconds, 880 milliseconds.
[info] Total number of tests run: 31
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

LuciferYang · 2023-12-25T10:30:48Z

...test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala

It seems that only the test cases used the input test data that would fail to type conversion when ANSI MODE is set to true when constructing test cases that would throw errors when read. In this PR, To quickly fix , I have skipped these three test inputs

@cloud-fan @zhengruifeng FYI

cloud-fan · 2023-12-25T10:41:39Z

...test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala

shall we just use valid int values?

should be ok

d95098c change to use 1 and 10 as input

dongjoon-hyun

+1, LGTM. Thank you, @LuciferYang and @cloud-fan .
Merged to master.

LuciferYang · 2023-12-26T00:51:59Z

Thanks @dongjoon-hyun and @cloud-fan ~

init

69bb5d8

github-actions bot added the SQL label Dec 25, 2023

LuciferYang commented Dec 25, 2023

View reviewed changes

cloud-fan reviewed Dec 25, 2023

View reviewed changes

use valid int

d95098c

LuciferYang force-pushed the SPARK-40876-FOLLOWUP branch from 8dcc734 to d95098c Compare December 25, 2023 12:07

cloud-fan approved these changes Dec 25, 2023

View reviewed changes

dongjoon-hyun approved these changes Dec 25, 2023

View reviewed changes

dongjoon-hyun closed this in c1888cd Dec 25, 2023

LuciferYang mentioned this pull request Dec 26, 2023

[SPARK-40876][SQL] Widening type promotions in Parquet readers #44368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true #44481

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true #44481

Uh oh!

LuciferYang commented Dec 25, 2023 •

edited

Loading

Uh oh!

LuciferYang Dec 25, 2023

Uh oh!

cloud-fan Dec 25, 2023

Uh oh!

LuciferYang Dec 25, 2023

Uh oh!

LuciferYang Dec 25, 2023

Uh oh!

dongjoon-hyun left a comment

Uh oh!

LuciferYang commented Dec 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in ParquetTypeWideningSuite when SPARK_ANSI_SQL_MODE is set to true #44481

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in ParquetTypeWideningSuite when SPARK_ANSI_SQL_MODE is set to true #44481

Uh oh!

Conversation

LuciferYang commented Dec 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang Dec 25, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 25, 2023

Choose a reason for hiding this comment

Uh oh!

LuciferYang Dec 25, 2023

Choose a reason for hiding this comment

Uh oh!

LuciferYang Dec 25, 2023

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Dec 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true #44481

[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true #44481

LuciferYang commented Dec 25, 2023 •

edited

Loading