Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Apr 18, 2017

What changes were proposed in this pull request?

This pr added parsing rules to support aliases in table value functions.

How was this patch tested?

Added tests in PlanParserSuite.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75876 has finished for PR 17666 at commit a611a13.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedTableValuedFunction(

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75888 has started for PR 17666 at commit 539a9e8.

@maropu
Copy link
Member Author

maropu commented Apr 18, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75889 has finished for PR 17666 at commit 539a9e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedTableValuedFunction(

Copy link
Contributor

@hvanhovell hvanhovell Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should put the multi-alias in a separate rule? Since it is also used by inline table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add the multi-alias anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the point. okay, I'll reconsider this. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add this. Lets keep the multi-alias for now.

Copy link
Member Author

@maropu maropu Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it seems I misunderstood what you pointed out. You meant should we need to support a query like SELECT * FROM [[tvf]] AS t(a, b, ...) in this pr? Yea, I know we currently support range only as a table value function though, I also think it'd be better to put a more general rule in this file. So, +1 for keeping this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I'll update this pr to separate this rule and share it with the inline table rule.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75903 has finished for PR 17666 at commit 8d3a037.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 18, 2017

@hvanhovell Could you check again?

@maropu
Copy link
Member Author

maropu commented Apr 21, 2017

ping

@maropu
Copy link
Member Author

maropu commented Apr 24, 2017

@hvanhovell ping

@maropu
Copy link
Member Author

maropu commented May 8, 2017

@hvanhovell @gatorsmile ping

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow inlineTable to add a separate rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, expectedNumCols needs to be part of type/class TVF. This is the num of TVF's output arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we need to add comments to this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the output of Range, I think we can simply add a Project above Range for column alias, like what we do in the Dataframe API: spark.range(100).toDF("i")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this an option? It is not optional right? The next function should either call Range(start, end, step, Some(numSlices), "id") or this function should have a default parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, ok. I'll update. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test different cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll add more tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I a bit worry though, this fix satisfies your intention? 625dbda

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this signature is kind of complex. Can we try to use some kind of class/case class that encapsulates this?

Copy link
Contributor

@hvanhovell hvanhovell May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we should separate the aliasing from constructing the table valued function. See @gatorsmile's earlier comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll re-consider the signature.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76604 has finished for PR 17666 at commit 399d823.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

Please rebase to pick up fix for the R tests.

Though again, I'm not sure why it is running R tests for this PR - is the change detection logic broken somehow?

@maropu
Copy link
Member Author

maropu commented May 9, 2017

ok, thanks! no, I think I do not touch on that.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76602 has finished for PR 17666 at commit f494e41.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76608 has finished for PR 17666 at commit 625dbda.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM.

Thank you! @maropu

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76615 has finished for PR 17666 at commit 81bef3b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76650 has finished for PR 17666 at commit 81bef3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 9, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.

## How was this patch tested?
Added tests in `PlanParserSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes #17666 from maropu/SPARK-20311.

(cherry picked from commit 714811d)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

@asfgit asfgit closed this in 714811d May 9, 2017
@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

@maropu Sorry. I think this PR introduces a regression.

scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").explain
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Range (1, 10, step=1, splits=None)
and
Range (1, 10, step=1, splits=None)
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;

I think we are taking the cross as the alias.

I reverted your change locally and the query worked. I am attaching the expected analyzed plan below.

scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").queryExecution.analyzed
res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Project [id#8L, id#9L]
+- Join Cross
   :- Range (1, 10, step=1, splits=None)
   +- Range (1, 10, step=1, splits=None)

@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

I am going to revert this PR from master and branch-2.2. I need to revert it because it is in branch-2.2 and 2.2 is in the RC staging.

@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

I have reverted this change from both master and branch-2.2. I have reopened the jira.

@maropu
Copy link
Member Author

maropu commented May 9, 2017

@yhuai okay, thanks for letting me know! I'll make a new pr to fix.

ghost pushed a commit to dbtsai/spark that referenced this pull request May 11, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.
The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`.

## How was this patch tested?
Added tests in `PlanParserSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17928 from maropu/SPARK-20311-3.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.

## How was this patch tested?
Added tests in `PlanParserSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17666 from maropu/SPARK-20311.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.
The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`.

## How was this patch tested?
Added tests in `PlanParserSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#17928 from maropu/SPARK-20311-3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants