Skip to content

Conversation

@iRakson
Copy link
Contributor

@iRakson iRakson commented Dec 18, 2019

What changes were proposed in this pull request?

If spark.sql.ansi.enabled is set,
throw exception when cast to any numeric type do not follow the ANSI SQL standards.

Why are the changes needed?

ANSI SQL standards do not allow invalid strings to get casted into numeric types and throw exception for that. Currently spark sql gives NULL in such cases.

Before:
select cast('str' as decimal) => NULL

After :
select cast('str' as decimal) => invalid input syntax for type numeric: str

These results are after setting spark.sql.ansi.enabled=true

Does this PR introduce any user-facing change?

Yes. Now when ansi mode is on users will get arithmetic exception for invalid strings.

How was this patch tested?

Unit Tests Added.

@iRakson
Copy link
Contributor Author

iRakson commented Dec 18, 2019

cc @cloud-fan #26518

@iRakson iRakson changed the title Throw Exception when invalid string is cast to decimal in ANSI mode [SPARK-30292][SQL]Throw Exception when invalid string is cast to decimal in ANSI mode Dec 18, 2019
@maropu
Copy link
Member

maropu commented Dec 18, 2019

also cc: @gengliangwang

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

we already have an AnsiCast expression, we can implement it there.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115509 has finished for PR 26933 at commit 3ed7795.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AnsiSqlCastToDecimal(child: Expression, timeZoneId: Option[String])

@iRakson
Copy link
Contributor Author

iRakson commented Dec 18, 2019

I will move them to AnsiCast and also look into test failures.

@gengliangwang
Copy link
Member

gengliangwang commented Dec 18, 2019

Well, I think the issue exists in converting a String to any Numeric types (short/int/long/float/double..)
Could you fix all of them in one PR?

@iRakson
Copy link
Contributor Author

iRakson commented Dec 19, 2019

Okk. I will include all of them in this PR only.

@iRakson
Copy link
Contributor Author

iRakson commented Dec 19, 2019

I have moved all rules to AnsiCast for all the numeric types(short,long,byte,decimal,double,float & int).
@cloud-fan @gengliangwang Please review the code.

@SparkQA
Copy link

SparkQA commented Dec 19, 2019

Test build #115570 has finished for PR 26933 at commit 1686c6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 20, 2019

(Also, you need to add tests...)

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115864 has finished for PR 26933 at commit 74809d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115862 has finished for PR 26933 at commit 69ee231.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@iRakson iRakson requested a review from cloud-fan December 27, 2019 15:03
@iRakson
Copy link
Contributor Author

iRakson commented Dec 27, 2019

@cloud-fan @gengliangwang @maropu Kindly review the changes.

Everything is moved inside CastBase. Some test cases have been moved under CastSuite as these will fail when ansiEnabled=true. New Test cases are added under AnsiCastSuite to check this PR.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115866 has finished for PR 26933 at commit a336084.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115952 has finished for PR 26933 at commit c0f8baf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@iRakson iRakson changed the title [SPARK-30292][SQL]Throw Exception when invalid string is cast to decimal in ANSI mode [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode Dec 31, 2019
@iRakson
Copy link
Contributor Author

iRakson commented Jan 2, 2020

@cloud-fan @maropu @gengliangwang @HyukjinKwon Kindly review the changes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116166 has finished for PR 26933 at commit 7d0faa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

try floatStr.toFloat catch {
case _: NumberFormatException =>
val f = Cast.processFloatingPointSpecialLiterals(floatStr, true)
if (f == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too much code duplication. How about unifying these 2 cases?

val f = Cast.processFloatingPointSpecialLiterals(floatStr, true)
if (f == null && ansiEnabled) {
  throw ...
} else {
  f
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

$evPrim = Float.valueOf($floatStr);
} catch (java.lang.NumberFormatException e) {
final Float f = (Float) Cast.processFloatingPointSpecialLiterals($floatStr, true);
if (f == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can unify the code a little bit

val handleNull = if (ansiEnabled) {
  s"throw ..."
} else {
  s"$evNull = true;"
}
...
code"""
  ...
  if (f == null) {
    $handleNull
  } else ...
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116175 has finished for PR 26933 at commit d454452.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"""
}
code"""
UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we don't need to create int wrapper at all for ansi mode

case StringType if ansi => (c, evPrim, evNull) => s"$evPrim = $c.toByteExact();"
case StringType => // the original code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@iRakson
Copy link
Contributor Author

iRakson commented Jan 8, 2020

@cloud-fan Please review this once. I have made all the required changes.

@iRakson iRakson requested a review from cloud-fan January 8, 2020 11:13
@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116291 has finished for PR 26933 at commit 4b0149c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

${changePrecision(tmp, target, evPrim, evNull, canNullSafeCast)}
} catch (java.lang.NumberFormatException e) {
$evNull = true;
if ($ansiEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this will generate java code with if-else, we can do better

val handleException = if (ansiEnabled) {
  s"throw new NumberFormatException("invalid input syntax for type numeric: $c");"
} else {
  s"$evNull =true;"
}
code"""
  ...
  } catch (java.lang.NumberFormatException e) {
    $handleException
  }
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment, thanks for your patience!

@iRakson
Copy link
Contributor Author

iRakson commented Jan 8, 2020

LGTM except one comment, thanks for your patience!

Thanks for the all the code reviews and suggestions. :)

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116296 has finished for PR 26933 at commit 40afc54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116297 has finished for PR 26933 at commit 2f845c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116304 has finished for PR 26933 at commit 0cb4edc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
Seq("nan", "nAn", " nan ").foreach { value =>
checkEvaluation(cast(value, DoubleType), Double.NaN)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast('nan' as float) is working fine.
Although CAST(CAST(nan AS DECIMAL(10,0)) AS DOUBLE) is returning NaN in pgsql.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e0efd21 Jan 14, 2020
maropu pushed a commit that referenced this pull request Mar 19, 2020
…rs (byte/short/int/long) should fail with fraction

### What changes were proposed in this pull request?

This is a followup of #26933

Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes #27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
maropu pushed a commit that referenced this pull request Mar 19, 2020
…rs (byte/short/int/long) should fail with fraction

### What changes were proposed in this pull request?

This is a followup of #26933

Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes #27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit ac262cb)
Signed-off-by: Takeshi Yamamuro <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…rs (byte/short/int/long) should fail with fraction

### What changes were proposed in this pull request?

This is a followup of apache#26933

Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes apache#27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants