Skip to content

Conversation

@clockfly
Copy link
Contributor

@clockfly clockfly commented May 6, 2016

What changes were proposed in this pull request?

Improve the physical plan visualization by adding meta info like table name and file path for data source.

Meta info InputPaths and TableName are newly added. Example:

scala> spark.range(10).write.saveAsTable("tt")
scala> spark.sql("select * from tt").explain()
== Physical Plan ==
WholeStageCodegen
:  +- BatchedScan HadoopFiles[id#13L] Format: ParquetFormat, InputPaths: file:/home/xzhong10/spark-linux/assembly/spark-warehouse/tt, PushedFilters: [], ReadSchema: struct<id:bigint>, TableName: default.tt

How was this patch tested?

manual tests.

Changes for UI:
Before:
ui_before_change

After:
fix_long_string

for_load

@clockfly clockfly changed the title [SPARK-14476][SQL] Improves the output of dataset.explain by adding source table names and file paths. [SPARK-14476][SQL][WIP] Improves the output of dataset.explain by adding source table names and file paths. May 6, 2016
@davies
Copy link
Contributor

davies commented May 6, 2016

@clockfly Can we show table name instead of HadoopFiles or together? If there is no table name, we could use the rightest part of path.

@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57962 has finished for PR 12947 at commit b1d01c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly
Copy link
Contributor Author

clockfly commented May 6, 2016

@davies

I made some changes in UI, please check whether it is better now?

scala> spark.sql("select * from tt").explain()
== Physical Plan ==
WholeStageCodegen
:  +- BatchedScan HadoopFiles default.tt[id#0L] Format: ParquetFormat, InputPaths: file:/home/xzhong10/spark-linux/assembly/spark-warehouse/tt, PushedFilters: [], ReadSchema: struct<id:bigint>

change_v2

@clockfly clockfly changed the title [SPARK-14476][SQL][WIP] Improves the output of dataset.explain by adding source table names and file paths. [SPARK-14476][SQL][WIP] Improve the physical plan visualization by adding meta info like table name and file path for data source. May 6, 2016
@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57995 has finished for PR 12947 at commit 438d70e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented May 6, 2016

LGTM.

@marmbrus Could you take a quick look on this?

@yhuai
Copy link
Contributor

yhuai commented May 6, 2016

@clockfly This PR does not truncate those long strings caused by long paths, right?

@clockfly
Copy link
Contributor Author

clockfly commented May 7, 2016

@clockfly clockfly changed the title [SPARK-14476][SQL][WIP] Improve the physical plan visualization by adding meta info like table name and file path for data source. [SPARK-14476][SQL] Improve the physical plan visualization by adding meta info like table name and file path for data source. May 7, 2016
override def simpleString: String = {
val metadataEntries = for ((key, value) <- metadata.toSeq.sorted) yield s"$key: $value"
val metadataEntries = for ((key, value) <- metadata.toSeq.sorted) yield {
key + ": " + StringUtils.abbreviate(value, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you play with some long paths and see if 100 is good value (it will be also good to put screenshot in the PR description)?

@clockfly
Copy link
Contributor Author

@yhuai
Thanks for the reminder, the css has been updated for the long tooltip.
fix_long_string

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58195 has finished for PR 12947 at commit b6b38a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 10, 2016

"HadoopFiles" isn't very useful, and sometimes the files are not even in Hadoop (e.g. it is just using Hadoop APIs to read S3). Can we say "scan" instead, and say the name of the data source?

e.g.

"parquet scan default.jt4"

@clockfly
Copy link
Contributor Author

How is the new UI?
fix_display_name

And for explain:

scala> spark.sql("select * from jt4").explain()
== Physical Plan ==
WholeStageCodegen
:  +- BatchedScan Scan parquet default.jt4[id#0L] Format: ParquetFormat, InputPaths: file:/home/xzhong10/aaaaaaaaaa/bbbbbbbb/ccccccccccc/ddddddddd/eeeeeeee/ffffffffff/gggggggg/hhhhhh..., PushedFilters: [], ReadSchema: struct<id:bigint>

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58229 has finished for PR 12947 at commit f0a0951.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 10, 2016

How does it look like when there is no table but just files?

@clockfly
Copy link
Contributor Author

Something like "Scan parquet" , but without table name suffix. I will show you an example.

@clockfly
Copy link
Contributor Author

For load:

scala> spark.read.format("json").load("/home/xzhong10/people.json")
res5: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> res5.explain()
== Physical Plan ==
WholeStageCodegen
:  +- Scan json[age#20L,name#21] Format: JSON, InputPaths: file:/home/xzhong10/people.json, PushedFilters: [], ReadSchema: struct<age:bigint,name:string>

for_load

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58250 has finished for PR 12947 at commit b3e9775.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/* Breaks the long string like file path when showing tooltips */
.tooltip-inner {
word-wrap:break-word;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline here

@davies
Copy link
Contributor

davies commented May 10, 2016

Could you also update the screen shot in PR description?

@clockfly
Copy link
Contributor Author

@davies, Updated.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58318 has finished for PR 12947 at commit 59f816f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 11, 2016

Thanks - merging in master/2.0.

asfgit pushed a commit that referenced this pull request May 11, 2016
…meta info like table name and file path for data source.

## What changes were proposed in this pull request?
Improve the physical plan visualization by adding meta info like table name and file path for data source.

Meta info InputPaths and TableName are newly added. Example:
```
scala> spark.range(10).write.saveAsTable("tt")
scala> spark.sql("select * from tt").explain()
== Physical Plan ==
WholeStageCodegen
:  +- BatchedScan HadoopFiles[id#13L] Format: ParquetFormat, InputPaths: file:/home/xzhong10/spark-linux/assembly/spark-warehouse/tt, PushedFilters: [], ReadSchema: struct<id:bigint>, TableName: default.tt
```

## How was this patch tested?

manual tests.

Changes for UI:
Before:
![ui_before_change](https://cloud.githubusercontent.com/assets/2595532/15064559/3d423e3c-1388-11e6-8099-7803ef496c4d.jpg)

After:
![fix_long_string](https://cloud.githubusercontent.com/assets/2595532/15133566/8ad09e26-1696-11e6-939c-99b908249b9d.jpg)

![for_load](https://cloud.githubusercontent.com/assets/2595532/15157224/3ba95c98-171d-11e6-885a-de0ee8dec27c.jpg)

Author: Sean Zhong <[email protected]>

Closes #12947 from clockfly/spark-14476.

(cherry picked from commit 61e0bdc)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 61e0bdc May 11, 2016
@davies
Copy link
Contributor

davies commented May 13, 2016

@clockfly It seems that this does not work with temporary tables, could you send an PR to fix that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants