Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Sep 16, 2019

What changes were proposed in this pull request?

When InSet generates Java switch-based code, if the input set is empty, we don't generate switch condition, but a simple expression that is default case of original switch.

Why are the changes needed?

SPARK-26205 adds an optimization to InSet that generates Java switch condition for certain cases. When the given set is empty, it is possibly that codegen causes compilation error:

[info] - SPARK-29100: InSet with empty input set *** FAILED *** (58 milliseconds)                                      
[info]   Code generation of input[0, int, true] INSET () failed:                                                                        
[info]   org.codehaus.janino.InternalCompilerException: failed to compile: org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object _i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset 45: Previous size 0, now 1                                                                                                           
[info]   org.codehaus.janino.InternalCompilerException: failed to compile: org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object _i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset 45: Previous size 0, now 1                                                                                                           
[info]         at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1308)                                                                                        
[info]         at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1386)               
[info]         at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1383)

Does this PR introduce any user-facing change?

Yes. Previously, when users have InSet against an empty set, generated code causes compilation error. This patch fixed it.

How was this patch tested?

Unit test added.

@viirya
Copy link
Member Author

viirya commented Sep 16, 2019

cc @cloud-fan @dongjoon-hyun @maropu

@viirya viirya changed the title [SPARK-29100][SQL] Codegen with switch in InSet expression causes compilation error [SPARK-29100][SQL] Fix compilation error in codegen with switch from InSet expression Sep 16, 2019
@dongjoon-hyun
Copy link
Member

Thank you for fixing this, @viirya .

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110660 has finished for PR 25806 at commit 5ad7538.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

break;
""")

val switchCode = if (caseBranches.size > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to compute caseBranches, too, if hset is empty?

    val valueGen = child.genCode(ctx)
    val switchCode = if (hset.isEmpty) {
      val caseValuesGen = hset.filter(_ != null).map(Literal(_).genCode(ctx))
      val caseBranches = caseValuesGen.map(literal =>
        code"""
          case ${literal.value}:
            ${ev.value} = true;
            break;
       """)

      code"""
        switch (${valueGen.value}) {
          ${caseBranches.mkString("\n")}
          default:
            ${ev.isNull} = $hasNull;
        }
       """
    } else {
      s"${ev.isNull} = $hasNull;"
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, can we fold this expr in the optimizer when hset is empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hset is empty, the computation of caseBranches is noop, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case. Usually InSet wouldn't have empty set, as we have a threshold to convert In to Inset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I see.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one minor comment.

@cloud-fan cloud-fan closed this in dffd92e Sep 17, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@viirya
Copy link
Member Author

viirya commented Sep 17, 2019

thanks @maropu @cloud-fan @dongjoon-hyun

@viirya viirya deleted the SPARK-29100 branch December 27, 2023 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants