Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

  1. After [SPARK-26218][SQL] Overflow on arithmetic operations returns incorrect result #21599, if the option "spark.sql.failOnIntegralTypeOverflow" is enabled, all the Binary Arithmetic operator will used the exact version function.
    However, only Add/Substract/Multiply has a corresponding exact function in java.lang.Math . When the option "spark.sql.failOnIntegralTypeOverflow" is enabled, a runtime exception "BinaryArithmetics must override either exactMathMethod or genCode" is thrown if the other Binary Arithmetic operators are used, such as "Divide", "Remainder".
    The exact math method should be called only when there is a corresponding function in java.lang.Math
  2. Revise the log output of casting to Int/Short
  3. Enable spark.sql.failOnIntegralTypeOverflow for pgSQL tests in SQLQueryTestSuite.

Why are the changes needed?

  1. Fix the bugs of [SPARK-26218][SQL] Overflow on arithmetic operations returns incorrect result #21599
  2. The test case of pgSQL intends to check the overflow of integer/long type. We should enable spark.sql.failOnIntegralTypeOverflow.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test.

localSparkSession.conf.set(SQLConf.CROSS_JOINS_ENABLED.key, true)
localSparkSession.conf.set(SQLConf.ANSI_SQL_PARSER.key, true)
localSparkSession.conf.set(SQLConf.PREFER_INTEGRAL_DIVISION.key, true)
localSparkSession.conf.set(SQLConf.FAIL_ON_INTEGRAL_TYPE_OVERFLOW.key, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of setting it always to true, I think we should set it to both true and false in the tests which are relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to enable it for all pgSQL tests:

  1. If we are going to have one flag for ANSI mode such as [SPARK-28989][SQL] Introduce ANSI SQL Dialect #25693, then we should enable all the ANSI features here.
  2. If we have to isolate this flag, then we need to isolate SQLConf.ANSI_SQL_PARSER as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this way we are running all our tests only in one of the two modes (the non-default one moreover), so we are not ensuring the behavior on both modes. I don't think this is a good idea. I think that when we will have the ANSI feature, we will run these tests both with ANSI enabled and not. Otherwise we would not provide proper coverage.

Copy link
Member Author

@gengliangwang gengliangwang Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I think the cause of the issue is test coverage.
Testing both modes for pgSQL tests seems great. The total time of pgSQL tests is around 7 minutes in my local setup, and org.apache.spark.sql.SQLQueryTestSuite is executed in parallel(see SparkBuild.scala).
I have created a follow-up https://issues.apache.org/jira/browse/SPARK-29098 for this.

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110630 has finished for PR 25804 at commit ca737c6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

localSparkSession.conf.set(SQLConf.FAIL_ON_INTEGRAL_TYPE_OVERFLOW.key, true)
// Propagate the SQL conf FAIL_ON_INTEGRAL_TYPE_OVERFLOW to executor.
// TODO: remove this after SPARK-29122 is resolved.
localSparkSession.sparkContext.setLocalProperty(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before https://issues.apache.org/jira/browse/SPARK-29122 is resolved, let's propagate the conf FAIL_ON_INTEGRAL_TYPE_OVERFLOW to executor.
The issue is test only.

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110794 has started for PR 25804 at commit c5674e4.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110804 has finished for PR 25804 at commit c5674e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look and seems fine to me.

@cloud-fan
Copy link
Contributor

LGTM. I think for pgsql tests, we should always set dialect to pgsql (once we have that config). For other golden file tests, we should test with ansi mode on and off. We can do it later.

@cloud-fan cloud-fan closed this in 3da2786 Sep 18, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants