Skip to content

Conversation

@Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Feb 24, 2020

What changes were proposed in this pull request?

When EXPLAIN sql query, the auto-generated argument alias shouldn't include expr/attribute id. This will provide better readability of Explain results. This is a follow-up to address #27368 (comment).

Before

(7) HashAggregate [codegen id : 2]
Input [2]: [key#x, max#x]
Keys [1]: [key#x]
Functions [1]: [max(val#x)]
Aggregate Attributes [1]: [max(val#x)#x]
Results [2]: [key#x, max(val#x)#x AS max(val)#x]

After

(7) HashAggregate [codegen id : 2]
Input [2]: [key#x, max#x]
Keys [1]: [key#x]
Functions [1]: [max(val)]
Aggregate Attributes [1]: [max(val)#x]
Results [2]: [key#x, max(val)#x AS max(val)#x]

Why are the changes needed?

Provide better readability for Explain result

Does this PR introduce any user-facing change?

Update Explain result to a better format

How was this patch tested?

Update existing tests

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118876 has finished for PR 27685 at commit 3cf84d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118880 has finished for PR 27685 at commit 3cf84d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119063 has finished for PR 27685 at commit 8238b2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

@cloud-fan Would you please help review this? Thanks so much!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still useful to have the attr id as the name can be duplicated

scala> sql("select 1 as a, 2 as a").explain
== Physical Plan ==
*(1) Project [1 AS a#53, 2 AS a#54]
+- Scan OneRowRelation[]

I think we should only remove the attr id from the auto-generated alias name. e.g. this should be [max(val)#x]

Copy link
Contributor Author

@Eric5553 Eric5553 Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the review :-)

The common modification in AttributeReference will also remove the # in [max(val)#x]. I'm trying to pin point to the exact creation place of auto-generated AttributeReference name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I've re-implemented to only remove the auto-generated Attrids I think. Would you please help review again? Thanks so much!

@Eric5553 Eric5553 changed the title [SPARK-30940][SQL] Remove meaningless attributeId when Explain SQL query [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query Mar 2, 2020
@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Test build #119179 has finished for PR 27685 at commit 85f2f01.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Test build #119178 has finished for PR 27685 at commit 85f2f01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119198 has finished for PR 27685 at commit e3b105f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119203 has finished for PR 27685 at commit e6df5d9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

Eric5553 commented Mar 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119220 has finished for PR 27685 at commit e6df5d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2020

Test build #119547 has finished for PR 27685 at commit 659e7c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

Eric5553 commented Mar 9, 2020

cc @cloud-fan @gatorsmile @maropu @maryannxue , thanks!

-- !query output
org.apache.spark.sql.AnalysisException
grouping expressions sequence is empty, and 'spark_catalog.default.test_having.`a`' is not an aggregate function. Wrap '(min(spark_catalog.default.test_having.`a`) AS `min(a#x)`, max(spark_catalog.default.test_having.`a`) AS `max(a#x)`)' in windowing function(s) or wrap 'spark_catalog.default.test_having.`a`' in first() (or first_value) if you don't care which value you get.;
grouping expressions sequence is empty, and 'spark_catalog.default.test_having.`a`' is not an aggregate function. Wrap '(min(spark_catalog.default.test_having.`a`) AS `min(a)`, max(spark_catalog.default.test_having.`a`) AS `max(a)`)' in windowing function(s) or wrap 'spark_catalog.default.test_having.`a`' in first() (or first_value) if you don't care which value you get.;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hide ids even in error messages, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flatArgument/flatArgumentsString is tightly bind with toString of Expression/AggregateExpression, which will commonly affect all of them. I'll try more to eliminate the impact, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message explicitly called map(_.sql) instead of default toString. The Alias.sql is using name field which has already been formatted by flatArguments when constructing Alias. So this is also following the flatArgument framework.
As the error message is intended to suggest user with a sql snippet, maybe it's better to not includ #exprId anyway? Thanks.

flatArguments.toSeq, "(", ", ", ")", SQLConf.get.maxToStringFields)
flatArgumentStrings.toSeq, "(", ", ", ")", SQLConf.get.maxToStringFields)

def argumentString: String = toString
Copy link
Contributor

@cloud-fan cloud-fan Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to only add one method? I'm worried about adding to many methods to the framework.

Copy link
Contributor Author

@Eric5553 Eric5553 Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the argumentString is needed because AttributeReference already overwrite toString, thus we need the new abstract string function to switch to non-exprid format. For flatArgumentStrings, it only have two callers. I refactored the toAggString of AggregateFunction, then we don't need to add the method flatArgumentStrings in Expression but just implement it within toString. See commit b74c500.

@SparkQA
Copy link

SparkQA commented Mar 9, 2020

Test build #119570 has finished for PR 27685 at commit 51bb46a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2020

Test build #119573 has finished for PR 27685 at commit 51bb46a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119597 has finished for PR 27685 at commit b74c500.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119607 has finished for PR 27685 at commit cee32e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119916 has finished for PR 27685 at commit 1cbfe49.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119928 has finished for PR 27685 at commit 10da2c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

@cloud-fan Would you please help review the latest change? Thanks so much :-)

@Eric5553
Copy link
Contributor Author

@cloud-fan @gatorsmile @maryannxue Would you please help review this PR? Thanks so much :-)

@maropu
Copy link
Member

maropu commented Apr 9, 2020

@Eric5553 Could you resolve the conflict?
@cloud-fan Could you do a final check?

@Eric5553
Copy link
Contributor Author

Eric5553 commented Apr 9, 2020

Updated. Thanks so much for helping maintain the PR ! @maropu

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121037 has finished for PR 27685 at commit c7daaa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 12, 2020
@github-actions github-actions bot closed this Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants