Skip to content

Conversation

@ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Add a new config spark.sql.maxMetadataStringLength. This config aims to limit metadata value length, e.g. file location.

Why are the changes needed?

Some metadata have been abbreviated by ... when I tried to add some test in SQLQueryTestSuite. We need to replace such value to notIncludedMsg. That caused we can't replace that like location value by className since the className has been abbreviated.

Here is a case:

CREATE table  explain_temp1 (key int, val int) USING PARQUET;

EXPLAIN EXTENDED SELECT sum(distinct val) FROM explain_temp1;

-- ignore parsed,analyzed,optimized
-- The output like
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(distinct cast(val#x as bigint)#xL)], output=[sum(DISTINCT val)#xL])
+- Exchange SinglePartition, true, [id=#x]
   +- *HashAggregate(keys=[], functions=[partial_sum(distinct cast(val#x as bigint)#xL)], output=[sum#xL])
      +- *HashAggregate(keys=[cast(val#x as bigint)#xL], functions=[], output=[cast(val#x as bigint)#xL])
         +- Exchange hashpartitioning(cast(val#x as bigint)#xL, 4), true, [id=#x]
            +- *HashAggregate(keys=[cast(val#x as bigint) AS cast(val#x as bigint)#xL], functions=[], output=[cast(val#x as bigint)#xL])
               +- *ColumnarToRow
                  +- FileScan parquet default.explain_temp1[val#x] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/home/runner/work/spark/spark/sql/core/spark-warehouse/org.apache.spark.sq...], PartitionFilters: ...

Does this PR introduce any user-facing change?

No, a new config.

How was this patch tested?

new test.

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128434 has finished for PR 29688 at commit 7b676b8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128435 has finished for PR 29688 at commit eb67ccd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

@maropu @dongjoon-hyun @cloud-fan do you have time to look this ? thanks !


val MAX_METADATA_STRING_LENGTH = buildConf("spark.sql.maxMetadataStringLength")
.doc("Maximum number of characters to output for a metadata string. e.g. " +
"`DataSourceScanExec`, every value will be abbreviated if exceed length.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. DataSourceScanExec => e.g. file location in DataSourceScanExec


test("SPARK-32827: Add spark.sql.maxMetadataStringLength config") {
withTempDir { dir =>
val tableName = "t1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: t1 -> t

val tableName = "t1"
val path = s"${dir.getCanonicalPath}/$tableName"
withTable(tableName) {
sql(s"create table t1(c int) using parquet location '$path'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use uppercases for SQL keywords, e.g., CREATE TABLE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@maropu
Copy link
Member

maropu commented Sep 9, 2020

Adding the config looks okay to me.

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128444 has finished for PR 29688 at commit 681f84e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128443 has finished for PR 29688 at commit 239f62b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xkrogen
Copy link
Contributor

xkrogen commented Sep 9, 2020

Glad to see this happening @ulysses-you ! We have a similar internal patch we added recently that we've been planning to open source. Nice to see that the changes are even simpler on the 3.x line.

}
}

test("SPARK-32827: Add spark.sql.maxMetadataStringLength config") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add spark.sql.maxMetadataStringLength config => Set max metadata string length

@gengliangwang
Copy link
Member

@ulysses-you thanks for the work.
It seems that the PR only changes the file source related metadata. Is there other places we can use the new config as well?

@ulysses-you
Copy link
Contributor Author

Thanks @gengliangwang .The origin propose is for explain test what I said in pr description, related pr is #29586. Location value has been abbreviated by ... since default metadata length is 100. We can add the config at head of test file like --SET spark.sql.maxMetadataStringLength=500; if we have this config, then we can get the total location value.

I'm not sure if it's well to do these change in one pr. But I can merge them into one pr if you think it's ok.

@SparkQA
Copy link

SparkQA commented Sep 13, 2020

Test build #128596 has finished for PR 29688 at commit 9c19245.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2020

Test build #128611 has finished for PR 29688 at commit e83bf95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor

@ulysses-you thanks for the work.
It seems that the PR only changes the file source related metadata. Is there other places we can use the new config as well?

I will use this config in my pr #29739.
Similar work in my pr and reverted waiting this.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 888b343 Sep 15, 2020
@ulysses-you
Copy link
Contributor Author

Thanks for merging, thanks all !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants