[SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec #25710

maropu · 2019-09-06T12:55:44Z

What changes were proposed in this pull request?

This pr proposes to define an individual method for each common subexpression in HashAggregateExec. In the current master, the common subexpr elimination code in HashAggregateExec is expanded in a single method;

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

Line 397 in 4664a08

|$effectiveCodes

The method size can be too big for JIT compilation, so I believe splitting it is beneficial for performance. For example, in a query SELECT SUM(a + b), AVG(a + b + c) FROM VALUES (1, 1, 1) t(a, b, c),

the current master generates;

/* 098 */   private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 099 */     // do aggregate
/* 100 */     // common sub-expressions
/* 101 */     int agg_value_6 = -1;
/* 102 */
/* 103 */     agg_value_6 = agg_expr_0_0 + agg_expr_1_0;
/* 104 */
/* 105 */     int agg_value_5 = -1;
/* 106 */
/* 107 */     agg_value_5 = agg_value_6 + agg_expr_2_0;
/* 108 */     boolean agg_isNull_4 = false;
/* 109 */     long agg_value_4 = -1L;
/* 110 */     if (!false) {
/* 111 */       agg_value_4 = (long) agg_value_5;
/* 112 */     }
/* 113 */     int agg_value_10 = -1;
/* 114 */
/* 115 */     agg_value_10 = agg_expr_0_0 + agg_expr_1_0;
/* 116 */     // evaluate aggregate functions and update aggregation buffers
/* 117 */     agg_doAggregate_sum_0(agg_value_10);
/* 118 */     agg_doAggregate_avg_0(agg_value_4, agg_isNull_4);
/* 119 */
/* 120 */   }

On the other hand, this pr generates;

/* 121 */   private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 122 */     // do aggregate
/* 123 */     // common sub-expressions
/* 124 */     long agg_subExprValue_0 = agg_subExpr_0(agg_expr_2_0, agg_expr_0_0, agg_expr_1_0);
/* 125 */     int agg_subExprValue_1 = agg_subExpr_1(agg_expr_0_0, agg_expr_1_0);
/* 126 */     // evaluate aggregate functions and update aggregation buffers
/* 127 */     agg_doAggregate_sum_0(agg_subExprValue_1);
/* 128 */     agg_doAggregate_avg_0(agg_subExprValue_0);
/* 129 */
/* 130 */   }

I run some micro benchmarks for this pr;

(base) maropu@~:$system_profiler SPHardwareDataType
Hardware:
    Hardware Overview:
      Processor Name: Intel Core i5
      Processor Speed: 2 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 8 GB

(base) maropu@~:$java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

(base) maropu@~:$ /bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v

val numCols = 40
val colExprs = "id AS key" +: (0 until numCols).map { i => s"id AS _c$i" }
spark.range(3000000).selectExpr(colExprs: _*).createOrReplaceTempView("t")

val aggExprs = (2 until numCols).map { i =>
  (0 until i).map(d => s"_c$d")
    .mkString("AVG(", " + ", ")")
}

// Drops the time of a first run then pick that of a second run
timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() }

// the master
maxCodeGen: 12957
Elapsed time: 36.309858661s  

// this pr
maxCodeGen=4184
Elapsed time: 2.399490285s

Why are the changes needed?

To avoid the too-long-function issue in JVMs.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests in WholeStageCodegenSuite

SparkQA · 2019-09-06T13:07:01Z

Test build #110237 has finished for PR 25710 at commit ba36945.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class CodegenContext extends Logging

SparkQA · 2019-09-06T14:38:11Z

Test build #110238 has finished for PR 25710 at commit 9016673.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class CodegenContext extends Logging

maropu · 2019-09-06T14:41:11Z

retest this please

SparkQA · 2019-09-06T18:41:59Z

Test build #110251 has finished for PR 25710 at commit 9016673.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class CodegenContext extends Logging

viirya · 2019-09-06T20:07:31Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+      }
+    }
+
+    val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) {


Although the original method should contain not only common expressions, this is probably good enough.

viirya · 2019-09-06T21:47:26Z

// the current master
Elapsed time: 47.920266373s

I also ran this benchmark to verify, but seems current master doesn't take so long on my laptop. It tooks about 4-5s.

Is anything causing the difference?

maropu · 2019-09-06T22:10:09Z

I also ran this benchmark to verify, but seems current master doesn't take so long on my laptop. It tooks about 4-5s.
Is anything causing the difference?

oh....probably, I made some mistakes.... I will re-run it and update the description later. Anyway, thanks for your check!

maropu · 2019-09-06T22:12:34Z

also cc: @cloud-fan @rednaxelafx @mgaido91

maropu · 2019-09-06T22:35:42Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+            JavaCode.isNullGlobal(isNull), JavaCode.global(value, expr.dataType))
+          exprs.foreach(localSubExprEliminationExprs.put(_, state))
+          val inputVariables = inputVars.map(_.variableName).mkString(", ")
+          s"${addNewFunction(fnName, fn)}($inputVariables);"


I think we might be able to split more as @mgaido91 did in #25642.

rednaxelafx · 2019-09-06T23:10:14Z

I haven't reviewed this PR in detail yet, just some first thoughts:

At a glance I'm neutral about this PR. In general I don't like code splitting that cause premature spilling of state from locals to fields. I might be more in favor of:

long commonSubExpr0 = agg_subExpr_0(input1, input2);
agg_doAggregate_sum_0(commonSubExpr0);
...

than

agg_subExpr_0(input1, input2); // result goes to this.commonSubExpr0
agg_doAggregate_sum_0(); // argument passed through `this`
...

In practice, after thorough inlining, the performance shouldn't be too different, but I just don't like the idea of blindly spilling state to fields when it's not necessary.
JIT compilers can optimize code, but usually has a hard time optimizing field layout and removing "unnecessary" fields -- that requires strong inter-procedural analysis.

mgaido91 · 2019-09-06T23:23:02Z

I agree with @rednaxelafx . Introducing many class fields shouldn't probably cause issues with the constant pool since we can batch variables in arrays, but this is also suboptimal. So it'd be great if we could avoid that.

maropu · 2019-09-06T23:30:19Z

Yea, I think so, too. If its possible for a split function to return two variables (value and isNull), we can easily localize these variables without the evil constant pool issue.... but, IIUC we currently has no logic for that. Any other idea to avoid that?

mgaido91 · 2019-09-07T08:13:21Z

What about leaving global only the isNull?

maropu · 2019-09-07T23:56:48Z

Ah, that's one of choices. I'll try to brush up the code based on that. Thanks!

maropu · 2019-09-08T04:46:30Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+               |  $isNullEvalCode
+               |  return ${eval.value};
+               |}
+               """.stripMargin


ISTM we might be able to apply the same change in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1060-L1069

yea, we can do it in a followup.

viirya · 2019-09-08T05:08:58Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+               """.stripMargin
+
+          val value = freshName("subExprValue")
+          val state = SubExprEliminationState(isNull, JavaCode.variable(value, expr.dataType))


One advantage of global variable is we don't care how this expr value is used later. It is ok even it is used in a split function. It is a local variable means we need to be careful and guarantee that these expressions would only be used at same scope.

Yea, I see. But, I just want add more pressure on the constant pool.... WDYT? @cloud-fan

AFAIK the code of common subexpression execution is always put together, not split. I don't think we need to worry about it now.

BTW I think one principle is: for corner cases which are really hard to generate code, we should just fallback to interpreted mode.

SparkQA · 2019-09-08T07:05:02Z

Test build #110295 has finished for PR 25710 at commit 3314954.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-09-08T07:34:51Z

retest this please

SparkQA · 2019-09-08T11:16:42Z

Test build #110301 has finished for PR 25710 at commit 3314954.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-09-10T12:49:53Z

LGTM. As usual, can we have a simple microbenchmark to show the advantage? I saw some discussion about the perf numbers but I can't find it in the PR description.

maropu · 2019-09-10T13:34:46Z

oh... I forgot to re-benchmark that. (I put wrong benchmark numbers first, so I removed then). I'll run benchmarks again and update the description for that soon.

maropu · 2019-09-10T22:25:42Z

I updated the PR description;

(base) maropu@~:$system_profiler SPHardwareDataType
Hardware:
    Hardware Overview:
      Processor Name: Intel Core i5
      Processor Speed: 2 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 8 GB

(base) maropu@~:$java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

(base) maropu@~:$ /bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v

val numCols = 40
val colExprs = "id AS key" +: (0 until numCols).map { i => s"id AS _c$i" }
spark.range(3000000).selectExpr(colExprs: _*).createOrReplaceTempView("t")

val aggExprs = (2 until numCols).map { i =>
  (0 until i).map(d => s"_c$d")
    .mkString("AVG(", " + ", ")")
}

// Drops the time of a first run then pick that of a second run
timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() }

// the master
maxCodeGen: 12957
Elapsed time: 36.309858661s  

// this pr
maxCodeGen=4184
Elapsed time: 2.399490285s

maropu · 2019-09-12T00:46:05Z

ping @cloud-fan @viirya

maropu · 2019-09-12T00:46:10Z

retest this please

SparkQA · 2019-09-12T04:41:44Z

Test build #110495 has finished for PR 25710 at commit 3314954.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-09-12T05:38:21Z

sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala

+
+  test("Give up splitting subexpression code if a parameter length goes over the limit") {
+    withSQLConf(
+        SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "false",


This test must be run under CODEGEN_SPLIT_AGGREGATE_FUNC = false?

Yea, we need to. If that flag is true, HashAggregateExec throws an exception in this test: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L327

viirya

Looks good. One question about test.

cloud-fan · 2019-09-12T13:26:44Z

LGTM, cc @rednaxelafx to take another look

maropu · 2019-09-13T23:27:29Z

ping @rednaxelafx

maropu · 2019-09-13T23:27:36Z

retest this please

SparkQA · 2019-09-14T03:08:03Z

Test build #110578 has finished for PR 25710 at commit 3314954.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-09-17T02:10:51Z

Thanks! Merged to master.
Thanks for all the reviewers! @rednaxelafx if you have some comments and I need follow-ups, please let me know!

Fix

9016673

maropu force-pushed the SplitSubexpr branch from ba36945 to 9016673 Compare September 6, 2019 13:13

viirya reviewed Sep 6, 2019

View reviewed changes

dongjoon-hyun added the SQL label Sep 6, 2019

maropu commented Sep 6, 2019

View reviewed changes

Address comments

3314954

maropu commented Sep 8, 2019

View reviewed changes

viirya reviewed Sep 8, 2019

View reviewed changes

viirya reviewed Sep 12, 2019

View reviewed changes

viirya approved these changes Sep 12, 2019

View reviewed changes

maropu closed this in 95073fb Sep 17, 2019

[SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec #25710

[SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec #25710

Uh oh!

Conversation

maropu commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Sep 6, 2019

Uh oh!

SparkQA commented Sep 6, 2019

Uh oh!

maropu commented Sep 6, 2019

Uh oh!

SparkQA commented Sep 6, 2019

Uh oh!

viirya Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Sep 6, 2019

Uh oh!

maropu Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rednaxelafx commented Sep 6, 2019

Uh oh!

mgaido91 commented Sep 6, 2019

Uh oh!

maropu commented Sep 6, 2019

Uh oh!

mgaido91 commented Sep 7, 2019

Uh oh!

maropu commented Sep 7, 2019

Uh oh!

maropu Sep 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 10, 2019

Choose a reason for hiding this comment

Uh oh!

viirya Sep 8, 2019

Choose a reason for hiding this comment

Uh oh!

maropu Sep 10, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 8, 2019

Uh oh!

maropu commented Sep 8, 2019

Uh oh!

SparkQA commented Sep 8, 2019

Uh oh!

cloud-fan commented Sep 10, 2019

Uh oh!

maropu commented Sep 10, 2019

Uh oh!

maropu commented Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Sep 12, 2019

Uh oh!

maropu commented Sep 12, 2019

Uh oh!

SparkQA commented Sep 12, 2019

maropu commented Sep 6, 2019 •

edited

Loading

viirya Sep 6, 2019 •

edited

Loading

viirya commented Sep 6, 2019 •

edited

Loading

maropu commented Sep 6, 2019 •

edited

Loading

maropu Sep 6, 2019 •

edited

Loading

maropu Sep 8, 2019 •

edited

Loading

cloud-fan Sep 10, 2019 •

edited

Loading

maropu commented Sep 10, 2019 •

edited

Loading