-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17912][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch #15467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #66897 has finished for PR 15467 at commit
|
|
Will do On Sun, Oct 16, 2016, 11:35 PM Kazuaki Ishizaki [email protected]
|
|
@ericl thank you very much |
|
ping @ericl |
|
@andrewor14 would it be possible to review this code? Or, could you please create the similar PR from #13899? |
|
@ericl, would it be possible to review this? |
| val scanTimeMetric = metricTerm(ctx, "scanTime") | ||
| val scanTimeTotalNs = ctx.freshName("scanTime") | ||
| ctx.addMutableState("long", scanTimeTotalNs, s"$scanTimeTotalNs = 0;") | ||
| val incReadBatches = if (!enableScanStatistics) "" else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave this out from this refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I did. Even if it does not exists, it works for now.
| val colVars = output.indices.map(i => ctx.freshName("colInstance" + i)) | ||
| val columnAssigns = colVars.zipWithIndex.map { case (name, i) => | ||
| ctx.addMutableState(columnVectorClz, name, s"$name = null;") | ||
| val index = if (columnIndexes == null) i else columnIndexes(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment: maybe we can not introduce columnIndexes for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Test build #71597 has finished for PR 15467 at commit
|
| */ | ||
| private[sql] trait ColumnarBatchScan extends CodegenSupport { | ||
|
|
||
| val inMemoryTableScan: InMemoryTableScanExec = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is unused right?
sameeragarwal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
Merging this into master, thanks! |
…ctor/ColumnarBatch ## What changes were proposed in this pull request? This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen. This PR is a part of apache#15219. In advance, this PR makes the code generation for `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too. ## How was this patch tested? tested existing test suites Author: Kazuaki Ishizaki <[email protected]> Closes apache#15467 from kiszk/columnarrefactor.
…ctor/ColumnarBatch ## What changes were proposed in this pull request? This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen. This PR is a part of apache#15219. In advance, this PR makes the code generation for `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too. ## How was this patch tested? tested existing test suites Author: Kazuaki Ishizaki <[email protected]> Closes apache#15467 from kiszk/columnarrefactor.
What changes were proposed in this pull request?
This PR refactors the code generation part to get data from
ColumnarVectorandColumnarBatchby using a traitColumnarBatchScanfor ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) sinceColumnarBatchwill be first citizen.This PR is a part of #15219. In advance, this PR makes the code generation for
ColumnarVectorandColumnarBatchreuseable as a trait. In general, this is very useful for other components from the reuseability view, too.How was this patch tested?
tested existing test suites