Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Oct 13, 2016

What changes were proposed in this pull request?

This PR refactors the code generation part to get data from ColumnarVector and ColumnarBatch by using a trait ColumnarBatchScan for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since ColumnarBatch will be first citizen.

This PR is a part of #15219. In advance, this PR makes the code generation for ColumnarVector and ColumnarBatch reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.

How was this patch tested?

tested existing test suites

@kiszk kiszk changed the title [SPARK-17192][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch [SPARK-17912][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch Oct 13, 2016
@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66897 has finished for PR 15467 at commit 5d78d01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Oct 17, 2016

@ericl, could you please review this? cc @davies

@ericl
Copy link
Contributor

ericl commented Oct 17, 2016

Will do

On Sun, Oct 16, 2016, 11:35 PM Kazuaki Ishizaki [email protected]
wrote:

@ericl https://github.com/ericl, could you please review this? cc
@davies https://github.com/davies


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15467 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6Smf05GrW4y2953NydrjUkz_gr4GWks5q0xcogaJpZM4KWFqi
.

@kiszk
Copy link
Member Author

kiszk commented Oct 18, 2016

@ericl thank you very much

@kiszk
Copy link
Member Author

kiszk commented Nov 29, 2016

ping @ericl

@kiszk
Copy link
Member Author

kiszk commented Dec 6, 2016

@andrewor14 would it be possible to review this code? Or, could you please create the similar PR from #13899?
Your original refactoring was great. In addition to that, I need to add some field for some cases.

@kiszk
Copy link
Member Author

kiszk commented Jan 8, 2017

@ericl, would it be possible to review this?

val scanTimeMetric = metricTerm(ctx, "scanTime")
val scanTimeTotalNs = ctx.freshName("scanTime")
ctx.addMutableState("long", scanTimeTotalNs, s"$scanTimeTotalNs = 0;")
val incReadBatches = if (!enableScanStatistics) "" else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this out from this refactoring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I did. Even if it does not exists, it works for now.

val colVars = output.indices.map(i => ctx.freshName("colInstance" + i))
val columnAssigns = colVars.zipWithIndex.map { case (name, i) =>
ctx.addMutableState(columnVectorClz, name, s"$name = null;")
val index = if (columnIndexes == null) i else columnIndexes(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment: maybe we can not introduce columnIndexes for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71597 has finished for PR 15467 at commit 9a53219.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private[sql] trait ColumnarBatchScan extends CodegenSupport {

val inMemoryTableScan: InMemoryTableScanExec = null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is unused right?

Copy link
Member

@sameeragarwal sameeragarwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@davies
Copy link
Contributor

davies commented Jan 19, 2017

Merging this into master, thanks!

@asfgit asfgit closed this in 148a84b Jan 19, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ctor/ColumnarBatch

## What changes were proposed in this pull request?

This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen.

This PR is a part of apache#15219. In advance, this PR makes the code generation for  `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.
## How was this patch tested?

tested existing test suites

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15467 from kiszk/columnarrefactor.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ctor/ColumnarBatch

## What changes were proposed in this pull request?

This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen.

This PR is a part of apache#15219. In advance, this PR makes the code generation for  `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.
## How was this patch tested?

tested existing test suites

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15467 from kiszk/columnarrefactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants