- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode #26933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| cc @cloud-fan #26518 | 
| also cc: @gengliangwang | 
| ok to test | 
| we already have an  | 
| Test build #115509 has finished for PR 26933 at commit  
 | 
| I will move them to  | 
| Well, I think the issue exists in converting a String to any Numeric types (short/int/long/float/double..) | 
| Okk. I will include all of them in this PR only. | 
| I have moved all rules to  | 
| Test build #115570 has finished for PR 26933 at commit  
 | 
        
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ANSISQLStandard.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | (Also, you need to add tests...) | 
| Test build #115864 has finished for PR 26933 at commit  
 | 
| Test build #115862 has finished for PR 26933 at commit  
 | 
| @cloud-fan @gengliangwang @maropu Kindly review the changes. Everything is moved inside  | 
| Test build #115866 has finished for PR 26933 at commit  
 | 
| Test build #115952 has finished for PR 26933 at commit  
 | 
        
          
                sql/core/src/test/resources/sql-tests/inputs/postgreSQL/float8.sql
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | @cloud-fan @maropu @gengliangwang @HyukjinKwon Kindly review the changes. | 
        
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | Test build #116166 has finished for PR 26933 at commit  
 | 
        
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
          
            Show resolved
            Hide resolved
        
      | try floatStr.toFloat catch { | ||
| case _: NumberFormatException => | ||
| val f = Cast.processFloatingPointSpecialLiterals(floatStr, true) | ||
| if (f == null) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too much code duplication. How about unifying these 2 cases?
val f = Cast.processFloatingPointSpecialLiterals(floatStr, true)
if (f == null && ansiEnabled) {
  throw ...
} else {
  f
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| $evPrim = Float.valueOf($floatStr); | ||
| } catch (java.lang.NumberFormatException e) { | ||
| final Float f = (Float) Cast.processFloatingPointSpecialLiterals($floatStr, true); | ||
| if (f == null) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can unify the code a little bit
val handleNull = if (ansiEnabled) {
  s"throw ..."
} else {
  s"$evNull = true;"
}
...
code"""
  ...
  if (f == null) {
    $handleNull
  } else ...
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Test build #116175 has finished for PR 26933 at commit  
 | 
| """ | ||
| } | ||
| code""" | ||
| UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah we don't need to create int wrapper at all for ansi mode
case StringType if ansi => (c, evPrim, evNull) => s"$evPrim = $c.toByteExact();"
case StringType => // the original code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @cloud-fan Please review this once. I have made all the required changes. | 
| Test build #116291 has finished for PR 26933 at commit  
 | 
| ${changePrecision(tmp, target, evPrim, evNull, canNullSafeCast)} | ||
| } catch (java.lang.NumberFormatException e) { | ||
| $evNull = true; | ||
| if ($ansiEnabled) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this will generate java code with if-else, we can do better
val handleException = if (ansiEnabled) {
  s"throw new NumberFormatException("invalid input syntax for type numeric: $c");"
} else {
  s"$evNull =true;"
}
code"""
  ...
  } catch (java.lang.NumberFormatException e) {
    $handleException
  }
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                sql/core/src/test/resources/sql-tests/results/postgreSQL/window_part4.sql.out
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one comment, thanks for your patience!
| 
 Thanks for the all the code reviews and suggestions. :) | 
| Test build #116296 has finished for PR 26933 at commit  
 | 
| Test build #116297 has finished for PR 26933 at commit  
 | 
| Test build #116304 has finished for PR 26933 at commit  
 | 
| } | ||
| Seq("nan", "nAn", " nan ").foreach { value => | ||
| checkEvaluation(cast(value, DoubleType), Double.NaN) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast('nan' as float) is working fine.
Although CAST(CAST(nan AS DECIMAL(10,0)) AS DOUBLE)  is returning NaN in pgsql.
| thanks, merging to master! | 
…rs (byte/short/int/long) should fail with fraction ### What changes were proposed in this pull request? This is a followup of #26933 Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode. ### Why are the changes needed? correct the ANSI cast behavior from string to integral ### Does this PR introduce any user-facing change? Yes under ANSI mode, but ANSI mode is off by default. ### How was this patch tested? new test Closes #27957 from cloud-fan/ansi. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
…rs (byte/short/int/long) should fail with fraction ### What changes were proposed in this pull request? This is a followup of #26933 Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode. ### Why are the changes needed? correct the ANSI cast behavior from string to integral ### Does this PR introduce any user-facing change? Yes under ANSI mode, but ANSI mode is off by default. ### How was this patch tested? new test Closes #27957 from cloud-fan/ansi. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit ac262cb) Signed-off-by: Takeshi Yamamuro <[email protected]>
…rs (byte/short/int/long) should fail with fraction ### What changes were proposed in this pull request? This is a followup of apache#26933 Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode. ### Why are the changes needed? correct the ANSI cast behavior from string to integral ### Does this PR introduce any user-facing change? Yes under ANSI mode, but ANSI mode is off by default. ### How was this patch tested? new test Closes apache#27957 from cloud-fan/ansi. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
If spark.sql.ansi.enabled is set,
throw exception when cast to any numeric type do not follow the ANSI SQL standards.
Why are the changes needed?
ANSI SQL standards do not allow invalid strings to get casted into numeric types and throw exception for that. Currently spark sql gives NULL in such cases.
Before:
select cast('str' as decimal) => NULLAfter :
select cast('str' as decimal) => invalid input syntax for type numeric: strThese results are after setting
spark.sql.ansi.enabled=trueDoes this PR introduce any user-facing change?
Yes. Now when ansi mode is on users will get arithmetic exception for invalid strings.
How was this patch tested?
Unit Tests Added.