Skip to content

Commit 3456d4f

Browse files
chenhao-dbcloud-fan
authored andcommitted
[SPARK-47681][FOLLOWUP] Fix schema_of_variant(decimal)
### What changes were proposed in this pull request? The PR #46338 found `schema_of_variant` sometimes could not correctly handle variant decimals and had a fix. However, I found that the fix is incomplete and `schema_of_variant` can still fail on some inputs. The reason is that `VariantUtil.getDecimal` calls `stripTrailingZeros`. For an input decimal `10.00`, the resulting scale is -1 and the unscaled value is 1. However, negative decimal scale is not allowed by Spark. The correct approach is to use the `BigDecimal` to construct a `Decimal` and read its precision and scale, as what we did in `VariantGet`. This PR also includes a minor change for `VariantGet`, where a duplicated expression is computed twice. ### Why are the changes needed? They are bug fixes and are required to process decimals correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? More unit tests. Some of them would fail without the change in this PR (e.g., `check("10.00", "DECIMAL(2,0)")`). Others wouldn't fail, but can still enhance test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46549 from chenhao-db/fix_decimal_schema. Authored-by: Chenhao Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 42f2132 commit 3456d4f

File tree

2 files changed

+13
-4
lines changed

2 files changed

+13
-4
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,7 @@ case object VariantGet {
341341
case Type.DOUBLE => Literal(v.getDouble, DoubleType)
342342
case Type.DECIMAL =>
343343
val d = Decimal(v.getDecimal)
344-
Literal(Decimal(v.getDecimal), DecimalType(d.precision, d.scale))
344+
Literal(d, DecimalType(d.precision, d.scale))
345345
case Type.DATE => Literal(v.getLong.toInt, DateType)
346346
case Type.TIMESTAMP => Literal(v.getLong, TimestampType)
347347
case Type.TIMESTAMP_NTZ => Literal(v.getLong, TimestampNTZType)
@@ -682,9 +682,8 @@ object SchemaOfVariant {
682682
case Type.STRING => SQLConf.get.defaultStringType
683683
case Type.DOUBLE => DoubleType
684684
case Type.DECIMAL =>
685-
val d = v.getDecimal
686-
// Spark doesn't allow `DecimalType` to have `precision < scale`.
687-
DecimalType(d.precision().max(d.scale()), d.scale())
685+
val d = Decimal(v.getDecimal)
686+
DecimalType(d.precision, d.scale)
688687
case Type.DATE => DateType
689688
case Type.TIMESTAMP => TimestampType
690689
case Type.TIMESTAMP_NTZ => TimestampNTZType

sql/core/src/test/scala/org/apache/spark/sql/VariantEndToEndSuite.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,16 @@ class VariantEndToEndSuite extends QueryTest with SharedSparkSession {
160160
check("1", "BIGINT")
161161
check("1.0", "DECIMAL(1,0)")
162162
check("0.01", "DECIMAL(2,2)")
163+
check("1.00", "DECIMAL(1,0)")
164+
check("10.00", "DECIMAL(2,0)")
165+
check("10.10", "DECIMAL(3,1)")
166+
check("0.0", "DECIMAL(1,0)")
167+
check("-0.0", "DECIMAL(1,0)")
168+
check("2147483647.999", "DECIMAL(13,3)")
169+
check("9223372036854775808", "DECIMAL(19,0)")
170+
check("-9223372036854775808.0", "DECIMAL(19,0)")
171+
check("9999999999999999999.9999999999999999999", "DECIMAL(38,19)")
172+
check("9999999999999999999.99999999999999999999", "DOUBLE")
163173
check("1E0", "DOUBLE")
164174
check("true", "BOOLEAN")
165175
check("\"2000-01-01\"", "STRING")

0 commit comments

Comments
 (0)