Skip to content

Conversation

@nongli
Copy link
Contributor

@nongli nongli commented Feb 2, 2016

This does the simplest thing of just assembly a row on consume and driving the
underlying external sorter object.

This does the simplest thing of just assembly a row on consume and driving the
underlying external sorter object.
@nongli
Copy link
Contributor Author

nongli commented Feb 2, 2016

/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */ 
/* 005 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   
/* 007 */   private Object[] references;
/* 008 */   private boolean Sort_needToSort0;
/* 009 */   private org.apache.spark.sql.execution.Sort Sort_plan1;
/* 010 */   private org.apache.spark.sql.execution.UnsafeExternalRowSorter Sort_sorter2;
/* 011 */   private scala.collection.Iterator<UnsafeRow> Sort_sortedIter3;
/* 012 */   private UnsafeRow Sort_result24;
/* 013 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder Sort_holder25;
/* 014 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter Sort_rowWriter26;
/* 015 */   
/* 016 */   private void Sort_addToSorter4() throws java.io.IOException {
/* 017 */     
/* 018 */     while (input.hasNext()) {
/* 019 */       InternalRow InputAdapter_row5 = (InternalRow) input.next();
/* 020 */       /* input[0, int] */
/* 021 */       boolean InputAdapter_isNull6 = InputAdapter_row5.isNullAt(0);
/* 022 */       int InputAdapter_value7 = InputAdapter_isNull6 ? -1 : (InputAdapter_row5.getInt(0));
/* 023 */       /* input[1, string] */
/* 024 */       boolean InputAdapter_isNull8 = InputAdapter_row5.isNullAt(1);
/* 025 */       UTF8String InputAdapter_value9 = InputAdapter_isNull8 ? null : (InputAdapter_row5.getUTF8String(1));
/* 026 */       
/* 027 */       /* (input[0, int] < 20) */
/* 028 */       /* input[0, int] */
/* 029 */       
/* 030 */       /* 20 */
/* 031 */       
/* 032 */       boolean Filter_value11 = false;
/* 033 */       Filter_value11 = InputAdapter_value7 < 20;
/* 034 */       if (!false && Filter_value11) {
/* 035 */         
/* 036 */         /* input[0, int] */
/* 037 */         
/* 038 */         /* input[1, string] */
/* 039 */         
/* 040 */         
/* 041 */         // Convert the input attributes to an UnsafeRow and add it to the sorter
/* 042 */         
/* 043 */         Sort_holder25.reset();
/* 044 */         
/* 045 */         Sort_rowWriter26.zeroOutNullBytes();
/* 046 */         
/* 047 */         /* input[0, int] */
/* 048 */         
/* 049 */         if (InputAdapter_isNull6) {
/* 050 */           Sort_rowWriter26.setNullAt(0);
/* 051 */         } else {
/* 052 */           Sort_rowWriter26.write(0, InputAdapter_value7);
/* 053 */         }
/* 054 */         
/* 055 */         /* input[1, string] */
/* 056 */         
/* 057 */         if (InputAdapter_isNull8) {
/* 058 */           Sort_rowWriter26.setNullAt(1);
/* 059 */         } else {
/* 060 */           Sort_rowWriter26.write(1, InputAdapter_value9);
/* 061 */         }
/* 062 */         Sort_result24.setTotalSize(Sort_holder25.totalSize());
/* 063 */         
/* 064 */         Sort_sorter2.insertRow(Sort_result24);
/* 065 */         
/* 066 */       }
/* 067 */       
/* 068 */     }
/* 069 */     
/* 070 */   }
/* 071 */   
/* 072 */   public GeneratedIterator(Object[] references) {
/* 073 */     this.references = references;
/* 074 */     Sort_needToSort0 = true;
/* 075 */     this.Sort_plan1 = (org.apache.spark.sql.execution.Sort) references[0];
/* 076 */     Sort_sorter2 = Sort_plan1.createSorter();
/* 077 */     
/* 078 */     Sort_result24 = new UnsafeRow(2);
/* 079 */     this.Sort_holder25 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(Sort_result24, 32);
/* 080 */     this.Sort_rowWriter26 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(Sort_holder25, 2);
/* 081 */   }
/* 082 */   
/* 083 */   protected void processNext() throws java.io.IOException {
/* 084 */     if (Sort_needToSort0) {
/* 085 */       Sort_addToSorter4();
/* 086 */       Sort_sortedIter3 = Sort_sorter2.sort();
/* 087 */       Sort_needToSort0 = false;
/* 088 */     }
/* 089 */     
/* 090 */     while (Sort_sortedIter3.hasNext()) {
/* 091 */       UnsafeRow Sort_outputRow29 = (UnsafeRow)Sort_sortedIter3.next();
/* 092 */       System.out.println(Sort_outputRow29);
/* 093 */       currentRow = Sort_outputRow29;
/* 094 */       return;
/* 095 */     }
/* 096 */   }
/* 097 */ }

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50511 has finished for PR 11008 at commit 11e26c9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeExternalRowSorter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty ghetto... (although i understand maybe it's the simplest way to implement this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is the state that needs to be kept between the two member functions in this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok here as discussed offline. i just found mutable state in here as a way to pass variable names through pretty brittle. maybe good to have a more general abstraction for this in codegen, but not that big of a deal right now/

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50584 has finished for PR 11008 at commit 564a5b3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

ctx.currentVars = input
val code = GenerateUnsafeProjection.createCode(ctx, colExprs, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the child can produce UnsafeRow (for example, Exchange), we should have a way to avoid this unpack and pack again, or we will see regression (generated version slower than non-generated).

I think we can pass the variable for input row into doCosume, could be null. It's better to do this after #11274 , then we don't need to worry about whether should we create variables for input or not.

asfgit pushed a commit that referenced this pull request Feb 29, 2016
## What changes were proposed in this pull request?
This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: #11008 (which actually implements the feature), and adds the following changes on top:

- [x]  Generated code updates peak execution memory metrics
- [x]  Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`

## How was this patch tested?

New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass.

Author: Sameer Agarwal <[email protected]>
Author: Nong Li <[email protected]>

Closes #11359 from sameeragarwal/sort-codegen.
@davies
Copy link
Contributor

davies commented Mar 2, 2016

@nongli Can you close this PR?

@nongli nongli closed this Mar 2, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?
This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: apache#11008 (which actually implements the feature), and adds the following changes on top:

- [x]  Generated code updates peak execution memory metrics
- [x]  Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`

## How was this patch tested?

New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass.

Author: Sameer Agarwal <[email protected]>
Author: Nong Li <[email protected]>

Closes apache#11359 from sameeragarwal/sort-codegen.
@nongli nongli deleted the spark-13123 branch March 28, 2016 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants