SPARK-5049: ParquetTableScan always prepends the values of partition col... #3870

rahulaggarwalguavus · 2015-01-01T13:09:05Z

SPARK-5049: ParquetTableScan always prepends the values of partition columns in output rows irrespective of the order of the partition columns in the original SELECT query

now forming a GenericRow by inserting column values at correct indexes

…columns in output rows irrespective of the order of the partition columns in the original SELECT query - forming a Generic row by inserting column values are correct indexes

AmplabJenkins · 2015-01-01T13:12:10Z

Can one of the admins verify this patch?

ash211 · 2015-01-02T03:16:26Z

Jenkins this is ok to test

SparkQA · 2015-01-02T03:17:34Z

Test build #24989 has started for PR 3870 at commit 5000110.

This patch merges cleanly.

SparkQA · 2015-01-02T03:59:02Z

Test build #24989 has finished for PR 3870 at commit 5000110.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-02T03:59:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24989/
Test FAILed.

…columns in output rows irrespective of the order of the partition columns in the original SELECT query - passing newOutput(correct sequence of attributes) in OutputFaker

SparkQA · 2015-01-04T11:47:30Z

Test build #25031 has started for PR 3870 at commit 8253a7c.

This patch merges cleanly.

SparkQA · 2015-01-04T12:59:56Z

Test build #25031 has finished for PR 3870 at commit 8253a7c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-04T12:59:59Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25031/
Test PASSed.

marmbrus · 2015-01-11T01:03:55Z

Thanks for figuring this out and proposing a solution! I guess our test cases missed this since they always perform later column reordering.

I'm a little concerned about the performance impact of this part of the change though:

// Fill outputRow with iter.next()._2 at the correct indexes using normalOutputIndexes
iter.next()._2
  .zipWithIndex
  .foreach(nI => outputRow(normalOutputIndexes(nI._2)) = nI._1)
  new GenericRow(outputRow)

It's both functional programming (which I normally love, but try to avoid in per-tuple codepaths) and allocates an object.

What do you think of the approach I took in #3990?

rahulaggarwalguavus · 2015-01-11T06:33:34Z

Thanks for reviewing. Yes, the approach you took in #3990 avoids this performance penalty.

Followup to #3870. Props to rahulaggarwalguavus for identifying the issue. Author: Michael Armbrust <[email protected]> Closes #3990 from marmbrus/SPARK-5049 and squashes the following commits: dd03e4e [Michael Armbrust] Fill in the partition values of parquet scans instead of using JoinedRow (cherry picked from commit 5d9fa55) Signed-off-by: Michael Armbrust <[email protected]>

Followup to #3870. Props to rahulaggarwalguavus for identifying the issue. Author: Michael Armbrust <[email protected]> Closes #3990 from marmbrus/SPARK-5049 and squashes the following commits: dd03e4e [Michael Armbrust] Fill in the partition values of parquet scans instead of using JoinedRow

yhuai · 2015-01-21T06:33:38Z

Since this issue has been fixed by #3990, we can close it.

yhuai · 2015-01-21T06:33:40Z

close this issue

SPARK-5049: ParquetTableScan always prepends the values of partition …

5000110

…columns in output rows irrespective of the order of the partition columns in the original SELECT query - forming a Generic row by inserting column values are correct indexes

SPARK-5049: ParquetTableScan always prepends the values of partition …

8253a7c

…columns in output rows irrespective of the order of the partition columns in the original SELECT query - passing newOutput(correct sequence of attributes) in OutputFaker

marmbrus mentioned this pull request Jan 11, 2015

[SPARK-5049][SQL] Fix ordering of partition columns in ParquetTableScan #3990

Closed

asfgit closed this in 622ff09 Jan 28, 2015

SPARK-5049: ParquetTableScan always prepends the values of partition col... #3870

SPARK-5049: ParquetTableScan always prepends the values of partition col... #3870

Uh oh!

Conversation

rahulaggarwalguavus commented Jan 1, 2015

Uh oh!

AmplabJenkins commented Jan 1, 2015

Uh oh!

ash211 commented Jan 2, 2015

Uh oh!

SparkQA commented Jan 2, 2015

Uh oh!

SparkQA commented Jan 2, 2015

Uh oh!

AmplabJenkins commented Jan 2, 2015

Uh oh!

SparkQA commented Jan 4, 2015

Uh oh!

SparkQA commented Jan 4, 2015

Uh oh!

AmplabJenkins commented Jan 4, 2015

Uh oh!

marmbrus commented Jan 11, 2015

Uh oh!

rahulaggarwalguavus commented Jan 11, 2015

Uh oh!

yhuai commented Jan 21, 2015

Uh oh!

yhuai commented Jan 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants