Skip to content

Conversation

@heyihong
Copy link
Contributor

@heyihong heyihong commented Sep 1, 2025

What changes were proposed in this pull request?

This PR refactors the LiteralExpressionProtoConverter to use CatalystTypeConverters for consistent type conversion, eliminating code duplication and improving maintainability.

Key changes:

  1. Simplified LiteralExpressionProtoConverter.toCatalystExpression(): Replaced the large switch statement (86 lines) with a clean 3-line implementation that leverages existing conversion utilities
  2. Added TIME type support: Added missing TIME literal type conversion in LiteralValueProtoConverter.toScalaValue()

Why are the changes needed?

  1. Type conversion issues: Some complex nested data structures (e.g., arrays of case classes) failed to convert to Catalyst's internal representation when using expressions.Literal.create(...).
  2. Inconsistent behaviors: Differences in behavior between Spark Connect and classic Spark for the same data types (e.g., Decimal).

Does this PR introduce any user-facing change?

Yes - Users can now successfully use typedLit with an array of case classes. Previously, attempting to use typedlit(Array(CaseClass(1, "a"))) would fail (please see the code piece below for details), but now it works correctly and converts case classes to proper struct representations.

import org.apache.spark.sql.functions.typedlit
case class CaseClass(a: Int, b: String)
spark.sql("select 1").select(typedlit(Array(CaseClass(1, "a")))).collect()

// Below is the error message:
"""
org.apache.spark.SparkIllegalArgumentException: requirement failed: Literal must have a corresponding value to array<struct<a:int,b:string>>, but class GenericArrayData found.
  scala.Predef$.require(Predef.scala:337)
  org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:306)
  org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:456)
  org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:206)
  org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:103)
"""

Besides, some catalyst values (e.g., Decimal 89.97620 -> 89.976200000000000000) have changed. Please see the changes in explain-results/ for details.

import org.apache.spark.sql.functions.typedlit

spark.sql("select 1").select(typedlit(BigDecimal(8997620, 5)),typedlit(Array(BigDecimal(8997620, 5), BigDecimal(8997621, 5)))).explain()
// Current explain() output:
"""
Project [89.97620 AS 89.97620#819, [89.97620,89.97621] AS ARRAY(89.97620BD, 89.97621BD)#820]
"""
// Expected explain() output (i.e., same as the classic mode):
"""
Project [89.976200000000000000 AS 89.976200000000000000#132, [89.976200000000000000,89.976210000000000000] AS ARRAY(89.976200000000000000BD, 89.976210000000000000BD)#133]
"""

How was this patch tested?

build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"
build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.4.5

@heyihong
Copy link
Contributor Author

heyihong commented Sep 1, 2025

@heyihong heyihong changed the title [SPARK-53438] Fix Product type handling in convertToCatalyst [SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst Sep 1, 2025
@heyihong heyihong changed the title [SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst [SPARK-53438][CONNECT][SQL] Fix Tuple type handling in convertToCatalyst Sep 1, 2025
@heyihong heyihong changed the title [SPARK-53438][CONNECT][SQL] Fix Tuple type handling in convertToCatalyst [SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst Sep 1, 2025
@heyihong heyihong changed the title [SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst [WIP][SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst Sep 1, 2025
@heyihong heyihong changed the title [WIP][SPARK-53438][CONNECT][SQL] Fix Product type handling in convertToCatalyst [WIP][SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter Sep 1, 2025
@heyihong heyihong changed the title [WIP][SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter [SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter Sep 1, 2025
@heyihong heyihong requested a review from cloud-fan September 3, 2025 15:28
@heyihong heyihong force-pushed the SPARK-53438 branch 2 times, most recently from f7f82ee to fd3f9d0 Compare September 3, 2025 15:39
@heyihong heyihong changed the title [SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter [WIP][SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter Sep 3, 2025
@heyihong heyihong changed the title [WIP][SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter [SPARK-53438][CONNECT][SQL] Use CatalystConverter in LiteralExpressionProtoConverter Sep 3, 2025
@heyihong heyihong requested a review from cloud-fan September 4, 2025 11:28
@heyihong heyihong force-pushed the SPARK-53438 branch 2 times, most recently from 45b5e95 to 7ed8146 Compare September 17, 2025 14:39
@heyihong
Copy link
Contributor Author

@cloud-fan @hvanhovell @zhengruifeng Please take another look

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 87a71fa Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants