Skip to content

Conversation

@cloud-fan
Copy link
Contributor

The current implementation is sub-optimal:

  • If an expression is always nullable, e.g. Unhex, we can still remove null check for children if they are not nullable.
  • If an expression has some non-nullable children, we can still remove null check for these children and keep null check for others.

This PR improves this by making the null check elimination more fine-grained.

@cloud-fan
Copy link
Contributor Author

cc @davies @nongli

@davies
Copy link
Contributor

davies commented Jan 29, 2016

@cloud-fan I'm not sure how much benefits (performance or code size) we can get from this, this make the code hard to understand.

@nongli
Copy link
Contributor

nongli commented Jan 29, 2016

Can you include the before and after of the generated code that illustrates when this is good?

@cloud-fan
Copy link
Contributor Author

Unhex with a not nullable child:
before:

UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = false;
byte[] value1 = null;
if (!false) {
  value1 = org.apache.spark.sql.catalyst.expressions.Hex.unhex(value3.getBytes());
  isNull0 = value1 == null;
}

after:

UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = false;
byte[] value1 = null;
value1 = org.apache.spark.sql.catalyst.expressions.Hex.unhex(value3.getBytes());
isNull0 = value1 == null;

`Substring` with a nullable `pos`, other children are not nullable: before:
/* input[0, string] */
UTF8String value3 = i.getUTF8String(0);
boolean isNull0 = true;
UTF8String value1 = null;
if (!false) {
  /* input[1, int] */
  boolean isNull4 = i.isNullAt(1);
  int value5 = isNull4 ? -1 : (i.getInt(1));
  if (!isNull4) {
    /* input[2, int] */
    int value7 = i.getInt(2);
    if (!false) {
      isNull0 = false;  // resultCode could change nullability
      value1 = value3.substringSQL(value5, value7);
    }
  }
}

after:

boolean isNull0 = true;
UTF8String value1 = null;
/* input[0, string] */
UTF8String value3 = i.getUTF8String(0);
/* input[1, int] */
boolean isNull4 = i.isNullAt(1);
int value5 = isNull4 ? -1 : (i.getInt(1));
if (!isNull4) {
  /* input[2, int] */
  int value7 = i.getInt(2);
  isNull0 = false; // resultCode could change nullability, so this should sit before it.
  value1 = value3.substringSQL(value5, value7);
}

I agree that this PR makes the code a little harder to read, but not that bad, the style of `ctx.genNullCheck` is kind of similar to if block :)

@rxin
Copy link
Contributor

rxin commented Jan 29, 2016

I think this is quite useful, but probably pretty major for performance too if we do it right. The main thing is we should make the code readable.

@davies
Copy link
Contributor

davies commented Jan 29, 2016

Could the compiler remove the branch?

  if (false) {
   xxx
  }

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50406 has finished for PR 10987 at commit 5b1097f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 29, 2016

JIT can - after 10000 runs on each partition ...

@davies
Copy link
Contributor

davies commented Jan 29, 2016

Since Janino 2.0.9 (September 5, 2004), it has "Optimized compilation of constantly-false IF statements"

So it does not need JIT to remove them, the generated bytecode should still be the same.

@cloud-fan
Copy link
Contributor Author

Does "constantly-false IF" also includes if (!false)...? Another question is, do we always get false as isNull for not nullable expression? I'm thinking about this comment: #10809 (comment)

@davies
Copy link
Contributor

davies commented Jan 29, 2016

@cloud-fan The answer is yes, here is how Janino compile the IfStatement:

            // Constant condition.
            this.fakeCompile(is.condition);
            BlockStatement seeingStatement, blindStatement;
            if (((Boolean) cv).booleanValue()) {
                seeingStatement = is.thenStatement;
                blindStatement  = es;
            } else {
                seeingStatement = es;
                blindStatement  = is.thenStatement;
            }

            // Compile the seeing statement.
            CodeContext.Inserter ins = this.codeContext.newInserter();
            boolean ssccn = this.compile(seeingStatement);
            if (ssccn) return true;

@cloud-fan
Copy link
Contributor Author

hmmm, so the benefit is quite small here, let me see if I can improve the code readability, or I agree this PR doesn't worth.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50467 has finished for PR 10987 at commit 7efc502.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50469 has finished for PR 10987 at commit 566006e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Feb 1, 2016

LGTM

@davies
Copy link
Contributor

davies commented Feb 1, 2016

Merging into master, thanks!

@asfgit asfgit closed this in c1da4d4 Feb 1, 2016
@cloud-fan cloud-fan deleted the null-check branch February 1, 2016 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants