Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Sep 6, 2019

What changes were proposed in this pull request?

This pr proposes to define an individual method for each common subexpression in HashAggregateExec. In the current master, the common subexpr elimination code in HashAggregateExec is expanded in a single method;

The method size can be too big for JIT compilation, so I believe splitting it is beneficial for performance. For example, in a query SELECT SUM(a + b), AVG(a + b + c) FROM VALUES (1, 1, 1) t(a, b, c),

the current master generates;

/* 098 */   private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 099 */     // do aggregate
/* 100 */     // common sub-expressions
/* 101 */     int agg_value_6 = -1;
/* 102 */
/* 103 */     agg_value_6 = agg_expr_0_0 + agg_expr_1_0;
/* 104 */
/* 105 */     int agg_value_5 = -1;
/* 106 */
/* 107 */     agg_value_5 = agg_value_6 + agg_expr_2_0;
/* 108 */     boolean agg_isNull_4 = false;
/* 109 */     long agg_value_4 = -1L;
/* 110 */     if (!false) {
/* 111 */       agg_value_4 = (long) agg_value_5;
/* 112 */     }
/* 113 */     int agg_value_10 = -1;
/* 114 */
/* 115 */     agg_value_10 = agg_expr_0_0 + agg_expr_1_0;
/* 116 */     // evaluate aggregate functions and update aggregation buffers
/* 117 */     agg_doAggregate_sum_0(agg_value_10);
/* 118 */     agg_doAggregate_avg_0(agg_value_4, agg_isNull_4);
/* 119 */
/* 120 */   }

On the other hand, this pr generates;

/* 121 */   private void agg_doConsume_0(InternalRow localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int agg_expr_2_0) throws java.io.IOException {
/* 122 */     // do aggregate
/* 123 */     // common sub-expressions
/* 124 */     long agg_subExprValue_0 = agg_subExpr_0(agg_expr_2_0, agg_expr_0_0, agg_expr_1_0);
/* 125 */     int agg_subExprValue_1 = agg_subExpr_1(agg_expr_0_0, agg_expr_1_0);
/* 126 */     // evaluate aggregate functions and update aggregation buffers
/* 127 */     agg_doAggregate_sum_0(agg_subExprValue_1);
/* 128 */     agg_doAggregate_avg_0(agg_subExprValue_0);
/* 129 */
/* 130 */   }

I run some micro benchmarks for this pr;

(base) maropu@~:$system_profiler SPHardwareDataType
Hardware:
    Hardware Overview:
      Processor Name: Intel Core i5
      Processor Speed: 2 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 8 GB

(base) maropu@~:$java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

(base) maropu@~:$ /bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v

val numCols = 40
val colExprs = "id AS key" +: (0 until numCols).map { i => s"id AS _c$i" }
spark.range(3000000).selectExpr(colExprs: _*).createOrReplaceTempView("t")

val aggExprs = (2 until numCols).map { i =>
  (0 until i).map(d => s"_c$d")
    .mkString("AVG(", " + ", ")")
}

// Drops the time of a first run then pick that of a second run
timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() }

// the master
maxCodeGen: 12957
Elapsed time: 36.309858661s  

// this pr
maxCodeGen=4184
Elapsed time: 2.399490285s   

Why are the changes needed?

To avoid the too-long-function issue in JVMs.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests in WholeStageCodegenSuite

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110237 has finished for PR 25710 at commit ba36945.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CodegenContext extends Logging

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110238 has finished for PR 25710 at commit 9016673.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CodegenContext extends Logging

@maropu
Copy link
Member Author

maropu commented Sep 6, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110251 has finished for PR 25710 at commit 9016673.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CodegenContext extends Logging

}
}

val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) {
Copy link
Member

@viirya viirya Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the original method should contain not only common expressions, this is probably good enough.

@viirya
Copy link
Member

viirya commented Sep 6, 2019

// the current master
Elapsed time: 47.920266373s

I also ran this benchmark to verify, but seems current master doesn't take so long on my laptop. It tooks about 4-5s.

Is anything causing the difference?

@maropu
Copy link
Member Author

maropu commented Sep 6, 2019

I also ran this benchmark to verify, but seems current master doesn't take so long on my laptop. It tooks about 4-5s.
Is anything causing the difference?

oh....probably, I made some mistakes.... I will re-run it and update the description later. Anyway, thanks for your check!

@maropu
Copy link
Member Author

maropu commented Sep 6, 2019

also cc: @cloud-fan @rednaxelafx @mgaido91

JavaCode.isNullGlobal(isNull), JavaCode.global(value, expr.dataType))
exprs.foreach(localSubExprEliminationExprs.put(_, state))
val inputVariables = inputVars.map(_.variableName).mkString(", ")
s"${addNewFunction(fnName, fn)}($inputVariables);"
Copy link
Member Author

@maropu maropu Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to split more as @mgaido91 did in #25642.

@rednaxelafx
Copy link
Contributor

I haven't reviewed this PR in detail yet, just some first thoughts:

At a glance I'm neutral about this PR. In general I don't like code splitting that cause premature spilling of state from locals to fields. I might be more in favor of:

long commonSubExpr0 = agg_subExpr_0(input1, input2);
agg_doAggregate_sum_0(commonSubExpr0);
...

than

agg_subExpr_0(input1, input2); // result goes to this.commonSubExpr0
agg_doAggregate_sum_0(); // argument passed through `this`
...

In practice, after thorough inlining, the performance shouldn't be too different, but I just don't like the idea of blindly spilling state to fields when it's not necessary.
JIT compilers can optimize code, but usually has a hard time optimizing field layout and removing "unnecessary" fields -- that requires strong inter-procedural analysis.

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 6, 2019

I agree with @rednaxelafx . Introducing many class fields shouldn't probably cause issues with the constant pool since we can batch variables in arrays, but this is also suboptimal. So it'd be great if we could avoid that.

@maropu
Copy link
Member Author

maropu commented Sep 6, 2019

Yea, I think so, too. If its possible for a split function to return two variables (value and isNull), we can easily localize these variables without the evil constant pool issue.... but, IIUC we currently has no logic for that. Any other idea to avoid that?

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 7, 2019

What about leaving global only the isNull?

@maropu
Copy link
Member Author

maropu commented Sep 7, 2019

Ah, that's one of choices. I'll try to brush up the code based on that. Thanks!

| $isNullEvalCode
| return ${eval.value};
|}
""".stripMargin
Copy link
Member Author

@maropu maropu Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we can do it in a followup.

""".stripMargin

val value = freshName("subExprValue")
val state = SubExprEliminationState(isNull, JavaCode.variable(value, expr.dataType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of global variable is we don't care how this expr value is used later. It is ok even it is used in a split function. It is a local variable means we need to be careful and guarantee that these expressions would only be used at same scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I see. But, I just want add more pressure on the constant pool.... WDYT? @cloud-fan

Copy link
Contributor

@cloud-fan cloud-fan Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the code of common subexpression execution is always put together, not split. I don't think we need to worry about it now.

BTW I think one principle is: for corner cases which are really hard to generate code, we should just fallback to interpreted mode.

@SparkQA
Copy link

SparkQA commented Sep 8, 2019

Test build #110295 has finished for PR 25710 at commit 3314954.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Sep 8, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 8, 2019

Test build #110301 has finished for PR 25710 at commit 3314954.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM. As usual, can we have a simple microbenchmark to show the advantage? I saw some discussion about the perf numbers but I can't find it in the PR description.

@maropu
Copy link
Member Author

maropu commented Sep 10, 2019

oh... I forgot to re-benchmark that. (I put wrong benchmark numbers first, so I removed then). I'll run benchmarks again and update the description for that soon.

@maropu
Copy link
Member Author

maropu commented Sep 10, 2019

I updated the PR description;

(base) maropu@~:$system_profiler SPHardwareDataType
Hardware:
    Hardware Overview:
      Processor Name: Intel Core i5
      Processor Speed: 2 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 8 GB

(base) maropu@~:$java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

(base) maropu@~:$ /bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v

val numCols = 40
val colExprs = "id AS key" +: (0 until numCols).map { i => s"id AS _c$i" }
spark.range(3000000).selectExpr(colExprs: _*).createOrReplaceTempView("t")

val aggExprs = (2 until numCols).map { i =>
  (0 until i).map(d => s"_c$d")
    .mkString("AVG(", " + ", ")")
}

// Drops the time of a first run then pick that of a second run
timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() }

// the master
maxCodeGen: 12957
Elapsed time: 36.309858661s  

// this pr
maxCodeGen=4184
Elapsed time: 2.399490285s   

@maropu
Copy link
Member Author

maropu commented Sep 12, 2019

ping @cloud-fan @viirya

@maropu
Copy link
Member Author

maropu commented Sep 12, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 12, 2019

Test build #110495 has finished for PR 25710 at commit 3314954.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


test("Give up splitting subexpression code if a parameter length goes over the limit") {
withSQLConf(
SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "false",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test must be run under CODEGEN_SPLIT_AGGREGATE_FUNC = false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One question about test.

@cloud-fan
Copy link
Contributor

LGTM, cc @rednaxelafx to take another look

@maropu
Copy link
Member Author

maropu commented Sep 13, 2019

ping @rednaxelafx

@maropu
Copy link
Member Author

maropu commented Sep 13, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 14, 2019

Test build #110578 has finished for PR 25710 at commit 3314954.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 95073fb Sep 17, 2019
@maropu
Copy link
Member Author

maropu commented Sep 17, 2019

Thanks! Merged to master.
Thanks for all the reviewers! @rednaxelafx if you have some comments and I need follow-ups, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants