[SPARK-13093][SQL] improve null check in nullSafeCodeGen for unary, binary and ternary expression #10987

cloud-fan · 2016-01-29T20:38:44Z

The current implementation is sub-optimal:

If an expression is always nullable, e.g. Unhex, we can still remove null check for children if they are not nullable.
If an expression has some non-nullable children, we can still remove null check for these children and keep null check for others.

This PR improves this by making the null check elimination more fine-grained.

cloud-fan · 2016-01-29T20:40:00Z

davies · 2016-01-29T21:02:07Z

@cloud-fan I'm not sure how much benefits (performance or code size) we can get from this, this make the code hard to understand.

nongli · 2016-01-29T21:26:21Z

Can you include the before and after of the generated code that illustrates when this is good?

cloud-fan · 2016-01-29T21:54:20Z

Unhex with a not nullable child:
before:

UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = false;
byte[] value1 = null;
if (!false) {
  value1 = org.apache.spark.sql.catalyst.expressions.Hex.unhex(value3.getBytes());
  isNull0 = value1 == null;
}

after:

UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = false;
byte[] value1 = null;
value1 = org.apache.spark.sql.catalyst.expressions.Hex.unhex(value3.getBytes());
isNull0 = value1 == null;

`Substring` with a nullable `pos`, other children are not nullable: before:

/* input[0, string] */
UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = true;
UTF8String value1 = null;
if (!false) {
  /* input[1, int] */
  boolean isNull4 = i.isNullAt(1);
  int value5 = isNull4 ? -1 : (i.getInt(1));
  if (!isNull4) {
    /* input[2, int] */
    int value7 = i.getInt(2);
    if (!false) {
      isNull0 = false;  // resultCode could change nullability
      value1 = value3.substringSQL(value5, value7);
    }
  }
}

after:

boolean isNull0 = true;
UTF8String value1 = null;
/* input[0, string] */
UTF8String value3 = i.getUTF8String(0);
/* input[1, int] */
boolean isNull4 = i.isNullAt(1);
int value5 = isNull4 ? -1 : (i.getInt(1));
if (!isNull4) {
  /* input[2, int] */
  int value7 = i.getInt(2);
  isNull0 = false; // resultCode could change nullability, so this should sit before it.
  value1 = value3.substringSQL(value5, value7);
}

I agree that this PR makes the code a little harder to read, but not that bad, the style of `ctx.genNullCheck` is kind of similar to if block :)

rxin · 2016-01-29T22:04:47Z

I think this is quite useful, but probably pretty major for performance too if we do it right. The main thing is we should make the code readable.

davies · 2016-01-29T22:06:50Z

Could the compiler remove the branch?

  if (false) {
   xxx
  }

SparkQA · 2016-01-29T22:07:05Z

Test build #50406 has finished for PR 10987 at commit 5b1097f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-29T22:11:50Z

JIT can - after 10000 runs on each partition ...

davies · 2016-01-29T22:16:58Z

Since Janino 2.0.9 (September 5, 2004), it has "Optimized compilation of constantly-false IF statements"

So it does not need JIT to remove them, the generated bytecode should still be the same.

cloud-fan · 2016-01-29T22:35:00Z

Does "constantly-false IF" also includes if (!false)...? Another question is, do we always get false as isNull for not nullable expression? I'm thinking about this comment: #10809 (comment)

davies · 2016-01-29T22:50:23Z

@cloud-fan The answer is yes, here is how Janino compile the IfStatement:

            // Constant condition.
            this.fakeCompile(is.condition);
            BlockStatement seeingStatement, blindStatement;
            if (((Boolean) cv).booleanValue()) {
                seeingStatement = is.thenStatement;
                blindStatement  = es;
            } else {
                seeingStatement = es;
                blindStatement  = is.thenStatement;
            }

            // Compile the seeing statement.
            CodeContext.Inserter ins = this.codeContext.newInserter();
            boolean ssccn = this.compile(seeingStatement);
            if (ssccn) return true;

cloud-fan · 2016-01-29T23:11:54Z

hmmm, so the benefit is quite small here, let me see if I can improve the code readability, or I agree this PR doesn't worth.

SparkQA · 2016-02-01T00:38:41Z

Test build #50467 has finished for PR 10987 at commit 7efc502.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-01T01:50:38Z

Test build #50469 has finished for PR 10987 at commit 566006e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-02-01T06:42:27Z

LGTM

davies · 2016-02-01T06:42:46Z

Merging into master, thanks!

cloud-fan force-pushed the null-check branch from dca28a0 to 5b1097f Compare January 29, 2016 20:39

improve nullSafeCodeGen for unary, binary and ternary expression

5b1097f

cloud-fan force-pushed the null-check branch from 7efc502 to 349d664 Compare February 1, 2016 00:03

improve readability

566006e

cloud-fan force-pushed the null-check branch from 349d664 to 566006e Compare February 1, 2016 00:14

asfgit closed this in c1da4d4 Feb 1, 2016

cloud-fan deleted the null-check branch February 1, 2016 07:25

[SPARK-13093][SQL] improve null check in nullSafeCodeGen for unary, binary and ternary expression #10987

[SPARK-13093][SQL] improve null check in nullSafeCodeGen for unary, binary and ternary expression #10987

Uh oh!

Conversation

cloud-fan commented Jan 29, 2016

Uh oh!

cloud-fan commented Jan 29, 2016

Uh oh!

davies commented Jan 29, 2016

Uh oh!

nongli commented Jan 29, 2016

Uh oh!

cloud-fan commented Jan 29, 2016

Uh oh!

rxin commented Jan 29, 2016

Uh oh!

davies commented Jan 29, 2016

Uh oh!

SparkQA commented Jan 29, 2016

Uh oh!

rxin commented Jan 29, 2016

Uh oh!

davies commented Jan 29, 2016

Uh oh!

cloud-fan commented Jan 29, 2016

Uh oh!

davies commented Jan 29, 2016

Uh oh!

cloud-fan commented Jan 29, 2016

Uh oh!

SparkQA commented Feb 1, 2016

Uh oh!

SparkQA commented Feb 1, 2016

Uh oh!

davies commented Feb 1, 2016

Uh oh!

davies commented Feb 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants