-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14476][SQL] Improve the physical plan visualization by adding meta info like table name and file path for data source. #12947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@clockfly Can we show table name instead of |
|
Test build #57962 has finished for PR 12947 at commit
|
…ata source like Hive table
|
I made some changes in UI, please check whether it is better now? |
|
Test build #57995 has finished for PR 12947 at commit
|
|
LGTM. @marmbrus Could you take a quick look on this? |
|
@clockfly This PR does not truncate those long strings caused by long paths, right? |
|
This PR truncate the long path by 100 chars |
| override def simpleString: String = { | ||
| val metadataEntries = for ((key, value) <- metadata.toSeq.sorted) yield s"$key: $value" | ||
| val metadataEntries = for ((key, value) <- metadata.toSeq.sorted) yield { | ||
| key + ": " + StringUtils.abbreviate(value, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you play with some long paths and see if 100 is good value (it will be also good to put screenshot in the PR description)?
|
@yhuai |
|
Test build #58195 has finished for PR 12947 at commit
|
|
"HadoopFiles" isn't very useful, and sometimes the files are not even in Hadoop (e.g. it is just using Hadoop APIs to read S3). Can we say "scan" instead, and say the name of the data source? e.g. "parquet scan default.jt4" |
|
And for explain: |
|
Test build #58229 has finished for PR 12947 at commit
|
|
How does it look like when there is no table but just files? |
|
Something like "Scan parquet" , but without table name suffix. I will show you an example. |
|
For load: |
|
Test build #58250 has finished for PR 12947 at commit
|
| /* Breaks the long string like file path when showing tooltips */ | ||
| .tooltip-inner { | ||
| word-wrap:break-word; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline here
|
Could you also update the screen shot in PR description? |
|
@davies, Updated. |
|
Test build #58318 has finished for PR 12947 at commit
|
|
Thanks - merging in master/2.0. |
…meta info like table name and file path for data source.
## What changes were proposed in this pull request?
Improve the physical plan visualization by adding meta info like table name and file path for data source.
Meta info InputPaths and TableName are newly added. Example:
```
scala> spark.range(10).write.saveAsTable("tt")
scala> spark.sql("select * from tt").explain()
== Physical Plan ==
WholeStageCodegen
: +- BatchedScan HadoopFiles[id#13L] Format: ParquetFormat, InputPaths: file:/home/xzhong10/spark-linux/assembly/spark-warehouse/tt, PushedFilters: [], ReadSchema: struct<id:bigint>, TableName: default.tt
```
## How was this patch tested?
manual tests.
Changes for UI:
Before:

After:


Author: Sean Zhong <[email protected]>
Closes #12947 from clockfly/spark-14476.
(cherry picked from commit 61e0bdc)
Signed-off-by: Reynold Xin <[email protected]>
|
@clockfly It seems that this does not work with temporary tables, could you send an PR to fix that? |




What changes were proposed in this pull request?
Improve the physical plan visualization by adding meta info like table name and file path for data source.
Meta info InputPaths and TableName are newly added. Example:
How was this patch tested?
manual tests.
Changes for UI:

Before:
After:
