Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Dec 25, 2023

What changes were proposed in this pull request?

This pr aims to change the test inputs in ParquetTypeWideningSuite to valid int to fix failed test in ParquetTypeWideningSuite when SPARK_ANSI_SQL_MODE` is set to true

Why are the changes needed?

Fix the day test failure when SPARK_ANSI_SQL_MODE is set to true.

[info] - unsupported parquet conversion IntegerType -> TimestampType *** FAILED *** (68 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 261.0 failed 1 times, most recent failure: Lost task 1.0 in stage 261.0 (TID 523) (localhost executor driver): org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '1.23' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22018
[info] == DataFrame ==
[info] "cast" was called from
[info] org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113)
[info] 
[info] 	at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
[info] 	at org.apache.spark.sql.catalyst.util.UTF8StringUtils.toIntExact(UTF8StringUtils.scala)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
[info] 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] 	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
[info] 	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:388)
[info] 	at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:101)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
[info] 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[info] 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
[info] 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[info] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] 	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] 	at org.apache.spark.scheduler.Task.run(Task.scala:141)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:628)
[info] 	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] 	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:631)
[info] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] 	at java.base/java.lang.Thread.run(Thread.java:840)

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass GitHub Actions
  • Manual check
SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite"

Before

[info] Run completed in 27 seconds, 432 milliseconds.
[info] Total number of tests run: 34
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]         org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite

After

[info] Run completed in 28 seconds, 880 milliseconds.
[info] Total number of tests run: 31
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang changed the title [SPARK-40876][SQL][TESTS] Skip the test inputs in ParquetTypeWideningSuite that violate the data type conversion rules under ANSI mode [SPARK-40876][SQL][TESTS][FOLLOWUP] Skip the test inputs in ParquetTypeWideningSuite that violate the data type conversion rules under ANSI mode Dec 25, 2023
@github-actions github-actions bot added the SQL label Dec 25, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that only the test cases used the input test data that would fail to type conversion when ANSI MODE is set to true when constructing test cases that would throw errors when read. In this PR, To quickly fix , I have skipped these three test inputs

@cloud-fan @zhengruifeng FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just use valid int values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d95098c change to use 1 and 10 as input

@LuciferYang LuciferYang changed the title [SPARK-40876][SQL][TESTS][FOLLOWUP] Skip the test inputs in ParquetTypeWideningSuite that violate the data type conversion rules under ANSI mode [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in ParquetTypeWideningSuite when SPARK_ANSI_SQL_MODE is set to true Dec 25, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang and @cloud-fan .
Merged to master.

@LuciferYang
Copy link
Contributor Author

Thanks @dongjoon-hyun and @cloud-fan ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants