[SPARK-13123][SQL] Implement whole state codegen for sort #11359

sameeragarwal · 2016-02-25T02:21:30Z

What changes were proposed in this pull request?

This PR adds support for implementing whole state codegen for sort. Builds heaving on @nongli 's PR: #11008 (which actually implements the feature), and adds the following changes on top:

Generated code updates peak execution memory metrics
Unit tests in WholeStageCodegenSuite and SQLMetricsSuite

How was this patch tested?

New unit tests in WholeStageCodegenSuite and SQLMetricsSuite. Further, all existing sort tests should pass.

This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object.

…ort-codegen

SparkQA · 2016-02-25T03:53:54Z

Test build #51922 has finished for PR 11359 at commit fa7c991.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-25T04:48:16Z

@sameeragarwal you should update the pr description to actually include what this patch does (in addition to that it was built on an earlier pr).

For code gen prs, would be great to paste in the generated code.

sameeragarwal · 2016-02-25T09:21:25Z

Thanks @rxin, added!

SparkQA · 2016-02-25T11:10:46Z

Test build #51953 has finished for PR 11359 at commit 02aa3d0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-25T18:36:23Z

Maybe paste the generated code in the comment section so it doesn't get merged as part of the commit. Otherwise the commit description is super long. Thanks.

sameeragarwal · 2016-02-25T18:39:04Z

Generated code:

/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ /** Codegened pipeline for:
/* 006 */ * Sort [id#0L ASC], true, 0
/* 007 */ +- INPUT
/* 008 */ */
/* 009 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 010 */   private Object[] references;
/* 011 */   private boolean sort_needToSort;
/* 012 */   private org.apache.spark.sql.execution.Sort sort_plan;
/* 013 */   private org.apache.spark.sql.execution.UnsafeExternalRowSorter sort_sorter;
/* 014 */   private org.apache.spark.executor.TaskMetrics sort_metrics;
/* 015 */   private scala.collection.Iterator<UnsafeRow> sort_sortedIter;
/* 016 */   private scala.collection.Iterator inputadapter_input;
/* 017 */   private UnsafeRow sort_result;
/* 018 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder sort_holder;
/* 019 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter sort_rowWriter;
/* 020 */   private org.apache.spark.sql.execution.metric.LongSQLMetric sort_dataSize;
/* 021 */   private org.apache.spark.sql.execution.metric.LongSQLMetricValue sort_metricValue;
/* 022 */   private org.apache.spark.sql.execution.metric.LongSQLMetric sort_spillSize;
/* 023 */   private org.apache.spark.sql.execution.metric.LongSQLMetricValue sort_metricValue1;
/* 024 */
/* 025 */   public GeneratedIterator(Object[] references) {
/* 026 */     this.references = references;
/* 027 */   }
/* 028 */
/* 029 */   public void init(scala.collection.Iterator inputs[]) {
/* 030 */     sort_needToSort = true;
/* 031 */     this.sort_plan = (org.apache.spark.sql.execution.Sort) references[0];
/* 032 */     sort_sorter = sort_plan.createSorter();
/* 033 */     sort_metrics = org.apache.spark.TaskContext.get().taskMetrics();
/* 034 */
/* 035 */     inputadapter_input = inputs[0];
/* 036 */     sort_result = new UnsafeRow(1);
/* 037 */     this.sort_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(sort_result, 0);
/* 038 */     this.sort_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(sort_holder, 1);
/* 039 */     this.sort_dataSize = (org.apache.spark.sql.execution.metric.LongSQLMetric) references[1];
/* 040 */     sort_metricValue = (org.apache.spark.sql.execution.metric.LongSQLMetricValue) sort_dataSize.localValue();
/* 041 */     this.sort_spillSize = (org.apache.spark.sql.execution.metric.LongSQLMetric) references[2];
/* 042 */     sort_metricValue1 = (org.apache.spark.sql.execution.metric.LongSQLMetricValue) sort_spillSize.localValue();
/* 043 */   }
/* 044 */
/* 045 */   private void sort_addToSorter() throws java.io.IOException {
/* 046 */     while (inputadapter_input.hasNext()) {
/* 047 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 048 */       /* input[0, bigint] */
/* 049 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 050 */       long inputadapter_value = inputadapter_isNull ? -1L : (inputadapter_row.getLong(0));
/* 051 */       // Convert the input attributes to an UnsafeRow and add it to the sorter
/* 052 */
/* 053 */       sort_rowWriter.zeroOutNullBytes();
/* 054 */
/* 055 */       if (inputadapter_isNull) {
/* 056 */         sort_rowWriter.setNullAt(0);
/* 057 */       } else {
/* 058 */         sort_rowWriter.write(0, inputadapter_value);
/* 059 */       }
/* 060 */
/* 061 */       sort_sorter.insertRow(sort_result);
/* 062 */       if (shouldStop()) {
/* 063 */         return;
/* 064 */       }
/* 065 */     }
/* 066 */
/* 067 */   }
/* 068 */
/* 069 */   protected void processNext() throws java.io.IOException {
/* 070 */     if (sort_needToSort) {
/* 071 */       sort_addToSorter();
/* 072 */       Long sort_spillSizeBefore = sort_metrics.memoryBytesSpilled();
/* 073 */       sort_sortedIter = sort_sorter.sort();
/* 074 */       sort_metricValue.add(sort_sorter.getPeakMemoryUsage());
/* 075 */       sort_metricValue1.add(sort_metrics.memoryBytesSpilled() - sort_spillSizeBefore);
/* 076 */       sort_metrics.incPeakExecutionMemory(sort_sorter.getPeakMemoryUsage());
/* 077 */       sort_needToSort = false;
/* 078 */     }
/* 079 */
/* 080 */     while (sort_sortedIter.hasNext()) {
/* 081 */       UnsafeRow sort_outputRow = (UnsafeRow)sort_sortedIter.next();
/* 082 */       append(sort_outputRow.copy());
/* 083 */       if (shouldStop()) return;
/* 084 */     }
/* 085 */   }
/* 086 */ }

SparkQA · 2016-02-25T18:49:10Z

Test build #51978 has finished for PR 11359 at commit c953a60.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sameeragarwal · 2016-02-25T19:05:59Z

test this please

SparkQA · 2016-02-25T20:32:43Z

Test build #51982 has finished for PR 11359 at commit c953a60.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-27T09:00:10Z

Test build #52109 has finished for PR 11359 at commit 4651ce9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sameeragarwal · 2016-02-27T17:03:11Z

@nongli this should be ready for your pass.

nongli · 2016-02-29T05:01:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala

+    val outputRow = ctx.freshName("outputRow")
+    val dataSize = metricTerm(ctx, "dataSize")
+    val spillSize = metricTerm(ctx, "spillSize")
+    val spillSizeBefore = ctx.freshName("spillSizeBefore")


This can just be a local var. Just remove the ".addMutableState" below and fix line 141.

Thanks, fixed.

nongli · 2016-02-29T05:03:12Z

LGTM

SparkQA · 2016-02-29T20:54:01Z

Test build #52192 has finished for PR 11359 at commit 65ed647.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-02-29T20:58:40Z

OK I am merging it.

davies · 2016-03-01T04:37:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala

+
+    s"""
+       | // Convert the input attributes to an UnsafeRow and add it to the sorter
+       | ${code.code}


This may have performance regression, when Sort is top of Exchange (or other operator that produce UnsafeRow), we will create variables from UnsafeRow, than create another UnsafeRow using these variables.

See #11008 (comment)

@yhuai Should we revert this patch or fix this by follow-up PR?

## What changes were proposed in this pull request? This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: apache#11008 (which actually implements the feature), and adds the following changes on top: - [x] Generated code updates peak execution memory metrics - [x] Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite` ## How was this patch tested? New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass. Author: Sameer Agarwal <[email protected]> Author: Nong Li <[email protected]> Closes apache#11359 from sameeragarwal/sort-codegen.

nongli and others added 6 commits February 1, 2016 16:31

[SPARK-13123][SQL] Implement whole state codegen for sort.

11e26c9

This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object.

Import order fixes.

564a5b3

Merge commit 'refs/pull/11008/head' of github.com:apache/spark into s…

7f50b6a

…ort-codegen

fix compile + tests

d50ca8e

add unit test in WholeStageCodegenSuite

aceab91

add test in SQLMetricsSuite

fa7c991

peak execution memory metrics

02aa3d0

fix metric initialization

c953a60

sameeragarwal added 3 commits February 25, 2016 13:08

Merge branch 'master' of github.com:apache/spark into sort-codegen

5674226

add codegen metrics

097376d

add shouldStop

4651ce9

nongli reviewed Feb 29, 2016
View reviewed changes

Nong's comments

65ed647

asfgit closed this in 4bd697d Feb 29, 2016

davies reviewed Mar 1, 2016
View reviewed changes

[SPARK-13123][SQL] Implement whole state codegen for sort #11359

[SPARK-13123][SQL] Implement whole state codegen for sort #11359

Uh oh!

Conversation

sameeragarwal commented Feb 25, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

rxin commented Feb 25, 2016

Uh oh!

sameeragarwal commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

rxin commented Feb 25, 2016

Uh oh!

sameeragarwal commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

sameeragarwal commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 27, 2016

Uh oh!

sameeragarwal commented Feb 27, 2016

Uh oh!

nongli Feb 29, 2016

Choose a reason for hiding this comment

Uh oh!

sameeragarwal Feb 29, 2016

Choose a reason for hiding this comment

Uh oh!

nongli commented Feb 29, 2016

Uh oh!

SparkQA commented Feb 29, 2016

Uh oh!

yhuai commented Feb 29, 2016

Uh oh!

davies Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants