Skip to content

Conversation

@gatorsmile
Copy link
Member

JIRA: https://issues.apache.org/jira/browse/SPARK-12989

In the rule ExtractWindowExpressions, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case:

val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num")
  .withColumn("Data", struct("A", "B", "C"))
  .drop("A")
  .drop("B")
  .drop("C")

val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
data.select($"*", max("num").over(winSpec) as "max").explain(true)

In this case, both Data.A and Data.B are alias in WindowSpecDefinition. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in missingExpr too.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50246 has finished for PR 10963 at commit 468abca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do we have to call toAttribute at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole part is confusing. Let me rewrite this function to make it clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, wait a second. I would like to fix this in branch-1.6 so a minimal fix would be preferred (and we can refactor master later if there is a good reason).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me change it to ne without calling toAttribute and move the test case to DataFrameWindowSuite. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50318 has finished for PR 10963 at commit 8025de0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50481 has finished for PR 10963 at commit 476dcb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Feb 1, 2016

Thanks, merging to master and 1.6.

asfgit pushed a commit that referenced this pull request Feb 1, 2016
JIRA: https://issues.apache.org/jira/browse/SPARK-12989

In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case:

```scala
val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num")
  .withColumn("Data", struct("A", "B", "C"))
  .drop("A")
  .drop("B")
  .drop("C")

val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
data.select($"*", max("num").over(winSpec) as "max").explain(true)
```
In this case, both `Data.A` and `Data.B` are `alias` in `WindowSpecDefinition`. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in `missingExpr` too.

Author: gatorsmile <[email protected]>
Author: xiaoli <[email protected]>
Author: Xiao Li <[email protected]>

Closes #10963 from gatorsmile/seletStarAfterColDrop.

(cherry picked from commit 33c8a49)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 33c8a49 Feb 1, 2016
@gatorsmile gatorsmile deleted the seletStarAfterColDrop branch February 2, 2016 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants