Skip to content

Conversation

@chenghao-intel
Copy link
Contributor

We don't support the complex expression keys in the rollup/cube, and we even will not report it if we have the complex group by keys, that will cause very confusing/incorrect result.

e.g. SELECT key%100 FROM src GROUP BY key %100 with ROLLUP

This PR adds an additional project during the analyzing for the complex GROUP BY keys, and that projection will be the child of Expand, so to Expand, the GROUP BY KEY are always the simple key(attribute names).

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37024 has finished for PR 7343 at commit c695760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

cc @marmbrus @yhuai

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

Can you explain what is the problem in the JIRA?

@chenghao-intel
Copy link
Contributor Author

Thanks you @yhuai , I've updated the description of the bug, and also add more unit tests for the fixing.

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #37112 has finished for PR 7343 at commit 34def69.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

@chenghao-intel Thanks for the update. I find complex group by keys is a confusing term. If I understand correctly, we do not handle expressions other than AttributeReferences, right? Also, with your change, seems we will change the group by expressions (e.g. from key % 100 to key)? What will happen if an expression involves multiple columns (e.g. key1 + key2)?

@chenghao-intel
Copy link
Contributor Author

That's correct, complex group by keys is the expression other than AttributeReferences.

I am not just simply change the group by keys, but with an alias. e.g.

SELECT key+key FROM src GROUP BY key+key WITH ROLLUP 
==>
SELECT k FROM (SELECT key + key as k FROM src) t GROUP BY k WITH ROLLUP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add toMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we want to keep the order group by expressions in the Aggregate operator, let's just keep it as it used to be.

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

oh, i see. Yeah, you are right.

Can we use another term for complex group by keys? It is confusing and something like key1 % 100 is not complex.

@chenghao-intel
Copy link
Contributor Author

@yhuai thanks for the reviewing, any better idea for the term complex group by keys? Or should I just call it non-attribute expressions?

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #1050 has finished for PR 7343 at commit 34def69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37300 has finished for PR 7343 at commit 827873f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not very accurate since you also have named expressions as keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will update it.

@yhuai
Copy link
Contributor

yhuai commented Jul 15, 2015

The fix looks good to me.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37313 has finished for PR 7343 at commit 1ebbb59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

seems not related.
retest this please

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #21 has finished for PR 7343 at commit 1ebbb59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37340 has finished for PR 7343 at commit 1ebbb59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

@yhuai any more comments?

@yhuai
Copy link
Contributor

yhuai commented Jul 16, 2015

LGTM. I am merging it to master.

@asfgit asfgit closed this in e272123 Jul 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants